You will find some basic information about ways to extract, load & maintain information with Clustered Columnstore Indexes.
You will need to have some knowledge about Columnstore Indexes structures to use it.
Whats new in Columnstore Indexes for SQL Server 2017Niko Neugebauer
Columnstore indexes in SQL Server 2017 include:
1) Support for LOB columns allowing values up to 2GB to be stored and compressed on or off-row.
2) Ability to define computed columns in tables for the first time, improving data warehouse scenarios.
3) Online rebuilding of nonclustered columnstore indexes for high availability during maintenance.
4) Enhancements to the query optimizer including adaptive memory grants, adaptive join selection, and removal of trivial plans that could prevent batch mode processing.
Niko Neugebauer gave a presentation on the columnstore improvements in SQL Server 2016. Some of the key improvements discussed include hybrid transactional/analytical processing (HTAP), new T-SQL syntax for defining columnstore indexes, high availability features like readable secondaries, improved data loading and batch processing performance, new maintenance features, and expanded monitoring capabilities. The presentation provided examples and demonstrations of many of these new columnstore features in SQL Server 2016.
Introduction into the world of Clustered Columnstore Indexes in SQL Server 2014, with explanations of the basic structures and functionalities.
Available data types, limitations and differences to SQL Server 2012 Nonclustered Columnstore Indexes are all described here
Deep dive into Clustered Columnstore structures with information on compression algorithms, compression types, locking and dictionaries, as well as the Batch Processing Mode.
With Power BI you can bring your BI architecture to the next level.
Architecture it's very important topic in a business intelligence project, let's discover which are right questions and possible scenarios to integrate Power BI in an existing environment or to build a new one from scratch.
We'll talkabout how to choose the right Storage Modes, how to design a refreshing policy, how to use dataflows to decouple and to lift the transformation process on Cloud and more.
Mike Boyarski gave a presentation on MemSQL, an operational data warehouse that provides real-time analytics capabilities. He discussed challenges with traditional databases around slow data loading, lengthy query times, and low concurrency. MemSQL addresses these issues with fast data ingestion, low latency queries, and high scalability. It can ingest streaming data, run on a variety of platforms, and provides security, SQL support, and integration with common data tools. MemSQL was shown augmenting an existing IoT architecture to enable real-time analytics through fast data loading, consolidated data storage, and high query performance.
If you are seeking ways to improve your cloud database environment with EDB Postgres, this presentation reviews how you can create a Database-as-a-Service (DBaaS) with EDB Postgres on AWS.
This presentation outlines how EDB Ark can play a key role in your digital transformation with more agility and speed.
It highlights:
● How EDB Ark can integrate with your existing AWS environment and other clouds
● How you can automate your database deployments to instantly spin up new databases
● How to manage your database environment easier using the same GUI for all clouds
● How to boost developer efficiency and satisfaction
Whether your database is currently in the cloud or you are considering the cloud as an option, this presentation will provide you with the information you need to evaluate EDB Postgres and EDB Ark.
The recording of this presentation includes a demonstration. Visit www.edbpostgres.com > resources > webcasts
Cosmos is a large-scale data processing system used by thousands at Microsoft to process exabytes of data across clusters of over 50,000 servers. It provides a SQL-like language and allows teams to easily share and join data. This drives huge scalability requirements. The Apollo scheduler was developed to maximize cluster utilization while minimizing latency for heterogeneous workloads at cloud scale. Later, JetScope was created to support lower latency interactive queries through intermediate result streaming and gang scheduling while maintaining fault tolerance.
Whats new in Columnstore Indexes for SQL Server 2017Niko Neugebauer
Columnstore indexes in SQL Server 2017 include:
1) Support for LOB columns allowing values up to 2GB to be stored and compressed on or off-row.
2) Ability to define computed columns in tables for the first time, improving data warehouse scenarios.
3) Online rebuilding of nonclustered columnstore indexes for high availability during maintenance.
4) Enhancements to the query optimizer including adaptive memory grants, adaptive join selection, and removal of trivial plans that could prevent batch mode processing.
Niko Neugebauer gave a presentation on the columnstore improvements in SQL Server 2016. Some of the key improvements discussed include hybrid transactional/analytical processing (HTAP), new T-SQL syntax for defining columnstore indexes, high availability features like readable secondaries, improved data loading and batch processing performance, new maintenance features, and expanded monitoring capabilities. The presentation provided examples and demonstrations of many of these new columnstore features in SQL Server 2016.
Introduction into the world of Clustered Columnstore Indexes in SQL Server 2014, with explanations of the basic structures and functionalities.
Available data types, limitations and differences to SQL Server 2012 Nonclustered Columnstore Indexes are all described here
Deep dive into Clustered Columnstore structures with information on compression algorithms, compression types, locking and dictionaries, as well as the Batch Processing Mode.
With Power BI you can bring your BI architecture to the next level.
Architecture it's very important topic in a business intelligence project, let's discover which are right questions and possible scenarios to integrate Power BI in an existing environment or to build a new one from scratch.
We'll talkabout how to choose the right Storage Modes, how to design a refreshing policy, how to use dataflows to decouple and to lift the transformation process on Cloud and more.
Mike Boyarski gave a presentation on MemSQL, an operational data warehouse that provides real-time analytics capabilities. He discussed challenges with traditional databases around slow data loading, lengthy query times, and low concurrency. MemSQL addresses these issues with fast data ingestion, low latency queries, and high scalability. It can ingest streaming data, run on a variety of platforms, and provides security, SQL support, and integration with common data tools. MemSQL was shown augmenting an existing IoT architecture to enable real-time analytics through fast data loading, consolidated data storage, and high query performance.
If you are seeking ways to improve your cloud database environment with EDB Postgres, this presentation reviews how you can create a Database-as-a-Service (DBaaS) with EDB Postgres on AWS.
This presentation outlines how EDB Ark can play a key role in your digital transformation with more agility and speed.
It highlights:
● How EDB Ark can integrate with your existing AWS environment and other clouds
● How you can automate your database deployments to instantly spin up new databases
● How to manage your database environment easier using the same GUI for all clouds
● How to boost developer efficiency and satisfaction
Whether your database is currently in the cloud or you are considering the cloud as an option, this presentation will provide you with the information you need to evaluate EDB Postgres and EDB Ark.
The recording of this presentation includes a demonstration. Visit www.edbpostgres.com > resources > webcasts
Cosmos is a large-scale data processing system used by thousands at Microsoft to process exabytes of data across clusters of over 50,000 servers. It provides a SQL-like language and allows teams to easily share and join data. This drives huge scalability requirements. The Apollo scheduler was developed to maximize cluster utilization while minimizing latency for heterogeneous workloads at cloud scale. Later, JetScope was created to support lower latency interactive queries through intermediate result streaming and gang scheduling while maintaining fault tolerance.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
This document discusses how organizations can save money on database management systems (DBMS) by moving from expensive commercial DBMS to more affordable open-source options like PostgreSQL. It notes that PostgreSQL has matured and can now handle mission critical workloads. The document recommends partnering with EnterpriseDB to take advantage of their commercial support and features for PostgreSQL. It highlights how customers have seen cost savings of 35-80% by switching to PostgreSQL and been able to reallocate funds to new business initiatives.
Membase is an open source, distributed key-value database management system optimized for storing data behind interactive web applications. It is designed to be simple, fast, and elastic. Membase allows data to scale out linearly by just adding more nodes, maintaining consistency when accessing data. It also includes a built-in Memcached caching layer.
In DiDi Chuxing Company, which is China’s most popular ride-sharing company. we use HBase to serve when we have a bigdata problem.
We run three clusters which serve different business needs. We backported the Region Grouping feature back to our internal HBase version so we could isolate the different use cases.
We built the Didi HBase Service platform which is popular amongst engineers at our company. It includes a workflow and project management function as well as a user monitoring view.
Internally we recommend users use Phoenix to simplify access.even more,we used row timestamp;multidimensional table schema to slove muti dimension query problems
C++, Go, Python, and PHP clients get to HBase via thrift2 proxies and QueryServer.
We run many important buisness applications out of our HBase cluster such as ETA/GPS/History Order/API metrics monitoring/ and Traffic in the Cloud. If you are interested in any aspects listed above, please come to our talk. We would like to share our experiences with you.
Cloudant Overview Bluemix Meetup from Lisa NeddamRomeo Kienzler
Cloudant is a fully-managed NoSQL distributed data layer service based on a JSON document store that provides high availability, scalability, simplicity and performance. It uses a flexible schema and scales massively while always being available. Cloudant is an operational data store and NoSQL document database with a simple HTTP API that is fully integrated with mobile devices, big data, cloud and delivery. It provides replication, sync, real-time analytics using MapReduce, full-text search and geospatial capabilities.
Membase Intro from Membase Meetup San FranciscoMembase
Membase is a distributed database that is simple, fast, and elastic. It can be deployed across commodity servers with zero downtime. Membase uses the memcached protocol and supports many programming languages out of the box. It provides high performance through techniques like asynchronous operations and write deduplication. Membase also offers elastic scaling through dynamic rebalancing and cloning of nodes. Future plans for Membase include advanced features like indexing and connectors to other systems through a new NodeCode extension framework.
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
Tracking which incoming files have been processed has always required thought and design when implementing an ETL framework. The Autoloader feature of Databricks looks to simplify this, taking away the pain of file watching and queue management. However, there can also be a lot of nuance and complexity in setting up Autoloader and managing the process of ingesting data using it. After implementing an automated data loading process in a major US CPMG, Simon has some lessons to share from the experience.
This session will run through the initial setup and configuration of Autoloader in a Microsoft Azure environment, looking at the components used and what is created behind the scenes. We’ll then look at some of the limitations of the feature, before walking through the process of overcoming these limitations. We will build out a practical example that tackles evolving schemas, applying transformations to your stream, extracting telemetry from the process and finally, how to merge the incoming data into a Delta table.
After this session you will be better equipped to use Autoloader in a data ingestion platform, simplifying your production workloads and accelerating the time to realise value in your data!
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
Splice Machine is a hybrid relational database management system (RDBMS) that allows for both online transaction processing (OLTP) and online analytical processing (OLAP) without the need for separate systems. It provides ANSI SQL support and transactional consistency for massive amounts of data while offering 10x faster performance at 1/4 the cost of other systems. Splice Machine can be deployed on-premises or in the cloud as a fully managed database as a service using the DC/OS platform, which provides container orchestration using Mesos and Docker along with networking and storage integration using CNI and RexRay.
Using Kafka to scale database replicationVenu Ryali
LinkedIn used Kafka to unify and scale their database infrastructure. They replaced their MySQL replication with a Kafka-based approach to allow for more flexible shard placement, easier cluster expansion, and higher availability. Using Kafka eliminated the need for a separate data replication system and provided significant cost savings compared to the previous architecture.
Journey to the Real-Time Analytics in Extreme GrowthSingleStore
The document summarizes AppsFlyer's journey to implement a real-time analytics solution to handle their extreme growth and increasing data volumes. They were previously using TokuDB but it was failing weekly and not scalable. They tried Druid but it did not meet their requirements. They then implemented MemSQL, an in-memory database, which provided faster query latency, recoverability, and the ability to scale to handle 30x more data while reducing costs. Their current architecture uses Kafka to ingest data, MemSQL clusters for real-time queries and a daily batch process to a columnstore for history.
Oracle Open World is Oracle's annual user conference, held in San Francisco. In 2013, it saw 60,000 attendees across 1,824 sessions led by 3,104 speakers. Major announcements included an in-memory database capable of 100x faster searches and 2x faster transactions than traditional databases, as well as support for Oracle Database and Oracle Linux on Microsoft's Azure cloud platform. The conference focused on four main themes: engineered systems, opportunities from disruptive technologies, customer experience, and JavaOne developer conference. Licensing costs and complexity were noted as an ongoing concern for attendees.
Building a derived data store using KafkaVenu Ryali
LinkedIn built a new derived data store called Venice to address limitations of their previous system Voldemort. Venice uses Kafka to enable scalable, fault-tolerant processing and replication of both batch-processed and incrementally updated derived data. It processes data through Hadoop jobs to Kafka topics, from which both batch-stored and real-time copies are maintained through Venice and Samza respectively. Kafka Mirror Maker replicates data across data centers for high availability.
Real time dashboards with Kafka and DruidVenu Ryali
This document describes a tracking platform that provides real-time insights by ingesting streaming data from various sources into Druid for analysis and visualization. It addresses challenges around acquiring data at scale from disparate systems, processing the data using Spark Streaming and Kafka, and aggregating and exploring the data in Druid and dashboards. The platform connects these systems together into a cohesive architecture for real-time analytics and model building.
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
Several different frameworks have been developed to draw data from Kafka and maintain standard SQL over continually changing data. This provides an easy way to query and transform data - now accessible by orders of magnitude more users.
At the same time, using Standard SQL against changing data is a new pattern for many engineers and analysts. While the language hasn’t changed, we’re still in the early stages of understanding the power of SQL over Kafka - and in some interesting ways, this new pattern introduces some exciting new idioms.
In this session, we’ll start with some basic use cases of how Standard SQL can be effectively used over events in Kafka- including how these SQL engines can help teams that are brand new to streaming data get started. From there, we’ll cover a series of more advanced functions and their implications, including:
- WHERE clauses that contain time change the validity intervals of your data; you can programmatically introduce and retract records based on their payloads!
- LATERAL joins turn streams of query arguments into query results; they will automatically share their query plans and resources!
- GROUP BY aggregations can be applied to ever-growing data collections; reduce data that wouldn't even fit in a database in the first place.
We'll review in-production examples where each of these cases make unmodified Standard SQL, run and maintain over data streams in Kafka, and provide the functionality of bespoke stream processors.
As HBase and Hadoop continue to become routine across enterprises, these enterprises inevitably shift priorities from effective deployments to cost-efficient operations. Consolidation of infrastructure, the sum of hardware, software, and system-administrator effort, is the most common strategy to reduce costs. As a company grows, the number of business organizations, development teams, and individuals accessing HBase grows commensurately, creating a not-so-simple requirement: HBase must effectively service many users, each with a variety of use-cases. This is problem is known as multi-tenancy. While multi-tenancy isn’t a new problem, it also isn’t a solved one, in HBase or otherwise. This talk will present a high-level view of the common issues organizations face when multiple users and teams share a single HBase instance and how certain HBase features were designed specifically to mitigate the issues created by the sharing of finite resources.
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...jaxLondonConference
Presented at JAX London 2013
All too often I have observed infrastructure designs for deploying Java applications come as an afterthought by businesses, technical analysts, and application developers. Choices of technologies are frequently made with no final deployment infrastructures being discussed. The talk will cover the design considerations on building resilient applications, and application deployment platforms across multiple data centres, and how organisations can leverage technologies such as Apache Cassandra to achieve this.
How to build and run infrastructure to support Product Catalog Search E-commerce initiative. I will talk about the lessons learnt and have a Q&A based on real life experience with the deployment at Ebates.
https://tech.rakuten.co.jp/
This presentation reviews the key methodologies that all the member of the team should consider such as:
- How to prioritize the right application or project for your first Oracle
- Tips to execute a well-defined, phased migration process to minimize risk and increase time to value
- Handling the common concerns and pitfalls related to a migration project
- What resources you can leverage before, during and after your migration
- Suggestions on how you can achieve independence from an Oracle database – without sacrificing performance.
Target audience: This presentation is intended for IT Decision-Makers and Leaders on the team involved in Database decisions and execution.
For more information, please email sales@enterprisedb.com
Introducing 3 FREE Smart solutions for SQL Server (Adi Sapir, Docco Labs)
As Database experts, we work with SQL Server Databases on a daily basis. We face the same problems every SQL Administrator and/or developer does. And – we spend our time writing solutions for these problems! In this session Adi will introduce the following 3, totally FREE solutions:
· ClipTable – A revolutionary new *anything* to SQL Table importer
· Database File Explorer – a much easier way to explore our database->filegroups->files->storage mapping
· Log Table Viewer – a complete client/server logger solution for SQL Server
Presentation delivered by Mircea Markus (JBoss) at the London JBoss User Group Event on Wednesday, the 4th of December 2013.
In this talk, Mircea Markus (Infinispan Lead) covers the major features added in the 6.0 release to the Infinispan ecosystem:
- querying in client/server mode
- better integration with the persistent storage
- multi-master cross-site replication
- support for heterogeneous clusters
Participants will take home a better understanding of Infinispan capabilities and the use cases in which it can be put at work. Ideally the attendees should have a basic knowledge of the in-memory data grids.
The document discusses three important things for IT leaders to know about SQL Server: database performance and speed matter; backups and disaster recovery plans are not all equal; and high availability/disaster recovery (HA/DR) tools provide proactive disaster protection. It provides tips on optimizing database performance through query tuning instead of hardware upgrades. It explains the importance of backing up transaction logs and having comprehensive disaster recovery plans, including solutions like AlwaysOn availability groups. The document promotes the services of SQLWatchmen for database diagnostics, tuning, disaster planning and recovery support.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
This document discusses how organizations can save money on database management systems (DBMS) by moving from expensive commercial DBMS to more affordable open-source options like PostgreSQL. It notes that PostgreSQL has matured and can now handle mission critical workloads. The document recommends partnering with EnterpriseDB to take advantage of their commercial support and features for PostgreSQL. It highlights how customers have seen cost savings of 35-80% by switching to PostgreSQL and been able to reallocate funds to new business initiatives.
Membase is an open source, distributed key-value database management system optimized for storing data behind interactive web applications. It is designed to be simple, fast, and elastic. Membase allows data to scale out linearly by just adding more nodes, maintaining consistency when accessing data. It also includes a built-in Memcached caching layer.
In DiDi Chuxing Company, which is China’s most popular ride-sharing company. we use HBase to serve when we have a bigdata problem.
We run three clusters which serve different business needs. We backported the Region Grouping feature back to our internal HBase version so we could isolate the different use cases.
We built the Didi HBase Service platform which is popular amongst engineers at our company. It includes a workflow and project management function as well as a user monitoring view.
Internally we recommend users use Phoenix to simplify access.even more,we used row timestamp;multidimensional table schema to slove muti dimension query problems
C++, Go, Python, and PHP clients get to HBase via thrift2 proxies and QueryServer.
We run many important buisness applications out of our HBase cluster such as ETA/GPS/History Order/API metrics monitoring/ and Traffic in the Cloud. If you are interested in any aspects listed above, please come to our talk. We would like to share our experiences with you.
Cloudant Overview Bluemix Meetup from Lisa NeddamRomeo Kienzler
Cloudant is a fully-managed NoSQL distributed data layer service based on a JSON document store that provides high availability, scalability, simplicity and performance. It uses a flexible schema and scales massively while always being available. Cloudant is an operational data store and NoSQL document database with a simple HTTP API that is fully integrated with mobile devices, big data, cloud and delivery. It provides replication, sync, real-time analytics using MapReduce, full-text search and geospatial capabilities.
Membase Intro from Membase Meetup San FranciscoMembase
Membase is a distributed database that is simple, fast, and elastic. It can be deployed across commodity servers with zero downtime. Membase uses the memcached protocol and supports many programming languages out of the box. It provides high performance through techniques like asynchronous operations and write deduplication. Membase also offers elastic scaling through dynamic rebalancing and cloning of nodes. Future plans for Membase include advanced features like indexing and connectors to other systems through a new NodeCode extension framework.
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
Tracking which incoming files have been processed has always required thought and design when implementing an ETL framework. The Autoloader feature of Databricks looks to simplify this, taking away the pain of file watching and queue management. However, there can also be a lot of nuance and complexity in setting up Autoloader and managing the process of ingesting data using it. After implementing an automated data loading process in a major US CPMG, Simon has some lessons to share from the experience.
This session will run through the initial setup and configuration of Autoloader in a Microsoft Azure environment, looking at the components used and what is created behind the scenes. We’ll then look at some of the limitations of the feature, before walking through the process of overcoming these limitations. We will build out a practical example that tackles evolving schemas, applying transformations to your stream, extracting telemetry from the process and finally, how to merge the incoming data into a Delta table.
After this session you will be better equipped to use Autoloader in a data ingestion platform, simplifying your production workloads and accelerating the time to realise value in your data!
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
Splice Machine is a hybrid relational database management system (RDBMS) that allows for both online transaction processing (OLTP) and online analytical processing (OLAP) without the need for separate systems. It provides ANSI SQL support and transactional consistency for massive amounts of data while offering 10x faster performance at 1/4 the cost of other systems. Splice Machine can be deployed on-premises or in the cloud as a fully managed database as a service using the DC/OS platform, which provides container orchestration using Mesos and Docker along with networking and storage integration using CNI and RexRay.
Using Kafka to scale database replicationVenu Ryali
LinkedIn used Kafka to unify and scale their database infrastructure. They replaced their MySQL replication with a Kafka-based approach to allow for more flexible shard placement, easier cluster expansion, and higher availability. Using Kafka eliminated the need for a separate data replication system and provided significant cost savings compared to the previous architecture.
Journey to the Real-Time Analytics in Extreme GrowthSingleStore
The document summarizes AppsFlyer's journey to implement a real-time analytics solution to handle their extreme growth and increasing data volumes. They were previously using TokuDB but it was failing weekly and not scalable. They tried Druid but it did not meet their requirements. They then implemented MemSQL, an in-memory database, which provided faster query latency, recoverability, and the ability to scale to handle 30x more data while reducing costs. Their current architecture uses Kafka to ingest data, MemSQL clusters for real-time queries and a daily batch process to a columnstore for history.
Oracle Open World is Oracle's annual user conference, held in San Francisco. In 2013, it saw 60,000 attendees across 1,824 sessions led by 3,104 speakers. Major announcements included an in-memory database capable of 100x faster searches and 2x faster transactions than traditional databases, as well as support for Oracle Database and Oracle Linux on Microsoft's Azure cloud platform. The conference focused on four main themes: engineered systems, opportunities from disruptive technologies, customer experience, and JavaOne developer conference. Licensing costs and complexity were noted as an ongoing concern for attendees.
Building a derived data store using KafkaVenu Ryali
LinkedIn built a new derived data store called Venice to address limitations of their previous system Voldemort. Venice uses Kafka to enable scalable, fault-tolerant processing and replication of both batch-processed and incrementally updated derived data. It processes data through Hadoop jobs to Kafka topics, from which both batch-stored and real-time copies are maintained through Venice and Samza respectively. Kafka Mirror Maker replicates data across data centers for high availability.
Real time dashboards with Kafka and DruidVenu Ryali
This document describes a tracking platform that provides real-time insights by ingesting streaming data from various sources into Druid for analysis and visualization. It addresses challenges around acquiring data at scale from disparate systems, processing the data using Spark Streaming and Kafka, and aggregating and exploring the data in Druid and dashboards. The platform connects these systems together into a cohesive architecture for real-time analytics and model building.
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
Several different frameworks have been developed to draw data from Kafka and maintain standard SQL over continually changing data. This provides an easy way to query and transform data - now accessible by orders of magnitude more users.
At the same time, using Standard SQL against changing data is a new pattern for many engineers and analysts. While the language hasn’t changed, we’re still in the early stages of understanding the power of SQL over Kafka - and in some interesting ways, this new pattern introduces some exciting new idioms.
In this session, we’ll start with some basic use cases of how Standard SQL can be effectively used over events in Kafka- including how these SQL engines can help teams that are brand new to streaming data get started. From there, we’ll cover a series of more advanced functions and their implications, including:
- WHERE clauses that contain time change the validity intervals of your data; you can programmatically introduce and retract records based on their payloads!
- LATERAL joins turn streams of query arguments into query results; they will automatically share their query plans and resources!
- GROUP BY aggregations can be applied to ever-growing data collections; reduce data that wouldn't even fit in a database in the first place.
We'll review in-production examples where each of these cases make unmodified Standard SQL, run and maintain over data streams in Kafka, and provide the functionality of bespoke stream processors.
As HBase and Hadoop continue to become routine across enterprises, these enterprises inevitably shift priorities from effective deployments to cost-efficient operations. Consolidation of infrastructure, the sum of hardware, software, and system-administrator effort, is the most common strategy to reduce costs. As a company grows, the number of business organizations, development teams, and individuals accessing HBase grows commensurately, creating a not-so-simple requirement: HBase must effectively service many users, each with a variety of use-cases. This is problem is known as multi-tenancy. While multi-tenancy isn’t a new problem, it also isn’t a solved one, in HBase or otherwise. This talk will present a high-level view of the common issues organizations face when multiple users and teams share a single HBase instance and how certain HBase features were designed specifically to mitigate the issues created by the sharing of finite resources.
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...jaxLondonConference
Presented at JAX London 2013
All too often I have observed infrastructure designs for deploying Java applications come as an afterthought by businesses, technical analysts, and application developers. Choices of technologies are frequently made with no final deployment infrastructures being discussed. The talk will cover the design considerations on building resilient applications, and application deployment platforms across multiple data centres, and how organisations can leverage technologies such as Apache Cassandra to achieve this.
How to build and run infrastructure to support Product Catalog Search E-commerce initiative. I will talk about the lessons learnt and have a Q&A based on real life experience with the deployment at Ebates.
https://tech.rakuten.co.jp/
This presentation reviews the key methodologies that all the member of the team should consider such as:
- How to prioritize the right application or project for your first Oracle
- Tips to execute a well-defined, phased migration process to minimize risk and increase time to value
- Handling the common concerns and pitfalls related to a migration project
- What resources you can leverage before, during and after your migration
- Suggestions on how you can achieve independence from an Oracle database – without sacrificing performance.
Target audience: This presentation is intended for IT Decision-Makers and Leaders on the team involved in Database decisions and execution.
For more information, please email sales@enterprisedb.com
Introducing 3 FREE Smart solutions for SQL Server (Adi Sapir, Docco Labs)
As Database experts, we work with SQL Server Databases on a daily basis. We face the same problems every SQL Administrator and/or developer does. And – we spend our time writing solutions for these problems! In this session Adi will introduce the following 3, totally FREE solutions:
· ClipTable – A revolutionary new *anything* to SQL Table importer
· Database File Explorer – a much easier way to explore our database->filegroups->files->storage mapping
· Log Table Viewer – a complete client/server logger solution for SQL Server
Presentation delivered by Mircea Markus (JBoss) at the London JBoss User Group Event on Wednesday, the 4th of December 2013.
In this talk, Mircea Markus (Infinispan Lead) covers the major features added in the 6.0 release to the Infinispan ecosystem:
- querying in client/server mode
- better integration with the persistent storage
- multi-master cross-site replication
- support for heterogeneous clusters
Participants will take home a better understanding of Infinispan capabilities and the use cases in which it can be put at work. Ideally the attendees should have a basic knowledge of the in-memory data grids.
The document discusses three important things for IT leaders to know about SQL Server: database performance and speed matter; backups and disaster recovery plans are not all equal; and high availability/disaster recovery (HA/DR) tools provide proactive disaster protection. It provides tips on optimizing database performance through query tuning instead of hardware upgrades. It explains the importance of backing up transaction logs and having comprehensive disaster recovery plans, including solutions like AlwaysOn availability groups. The document promotes the services of SQLWatchmen for database diagnostics, tuning, disaster planning and recovery support.
T sql performance guidelines for better db stress powersShehap Elnagar
This document provides a summary of Shehap El-Nagar's background and expertise in SQL Server performance tuning. It outlines his experience as an MVP, MCITP, and consultant for various organizations in the Gulf region. The document also lists ways to contact Shehap and an agenda for an upcoming presentation on performance analysis and T-SQL optimization techniques.
This document provides guidelines for developing PL/SQL components including naming conventions, formatting, commenting practices, and optimizations. Key guidelines include using prefixes for different object types, indenting with 3 spaces, writing descriptive header comments, avoiding unnecessary full table scans, and leveraging collections like nested tables for persistence. Performance best practices focus on proper indexing, avoiding context switching between SQL and PL/SQL, and bulk operations over iterative processing.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Turbocharge SQL Performance in PL/SQL with Bulk ProcessingSteven Feuerstein
Is your Oracle Database application running slower than you'd like? One of the first things to check is row-by-row processing: non-query DML (insert, update, delete) within a loop. And the fix? Bulk processing, either with smarter SQL or with FORALL and BULK COLLECT in PL/SQL.
Ten query tuning techniques every SQL Server programmer should knowKevin Kline
From the noted database expert and author of 'SQL in a Nutshell' - SELECT statements have a reputation for being very easy to write, but hard to write very well. This session will take you through ten of the most problematic patterns and anti-patterns when writing queries and how to deal with them all. Loaded with live demonstrations and useful techniques, this session will teach you how to take your SQL Server queries mundane to masterful.
Maintenance plans provide a way to automate database maintenance tasks such as integrity checks, index maintenance, and backups. They can be created using the Maintenance Plan Wizard or Maintenance Plan Designer. Common tasks include checking database integrity with DBCC CHECKDB, reorganizing or rebuilding indexes, updating statistics, and performing full, differential or transaction log backups. Care must be taken to choose the right tasks and schedule to maintain performance and protect the database.
SQLSaturday is a training event for SQL Server professionals and those wanting to learn about SQL Server. This event will be held Jun 13 2015 at Hochschule Bonn-Rhein-Sieg, Grantham-Allee 20, St. Augustin, Rheinland, 53757, Germany. Admittance to this event is free, all costs are covered by donations and sponsorships. Please register soon as seating is limited, and let friends and colleagues know about the event.
###
Maintenance Plans for Beginners (but not only) | Each of experienced administrators used (to some extent) what is called Maintenance Plans - Plans of Conservation. During this session, I'd like to discuss what can be useful for us to provide functionality when we use them and what to look out for. Session at 200 times the forward-300, with the opening of the discussion.
This document discusses ETL (extract, transform, load) processes and challenges in implementing ETL solutions. It argues that standalone ETL products are outdated for modern systems that have existing IT infrastructure. When re-architecting a system, factors like flexibility, performance, reliability, data freshness, and tooling must be considered. The document presents a reference architecture for building an efficient, scalable ETL solution using WSO2 Enterprise Middleware Platform. It demonstrates how to perform data transformations between models using the Smooks Editor tool shipped with WSO2 Developer Studio. In summary, ETL plays an important role but requires effort, and the WSO2 platform enables easier re-architecting of data models with proper
Exploring plsql new features best practices september 2013Andrejs Vorobjovs
The document discusses exploring new features and best practices in PL/SQL for better performance. It covers topics like parsing time, bulk binding, PL/SQL function result cache, subprogram inlining, finer grained dependencies, and new features in Oracle Database 12c. The presentation provides an overview of Oracle SQL Developer and guidelines for writing efficient and readable PL/SQL code.
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
Autonomous Transaction Processing (ATP) - the second in the family of Oracle’s Autonomous Databases – offers Oracle DBAs the ability to apply a force multiplier for their OLTP database application workloads. However, it’s important to understand both the benefits and limitations of ATP before migrating any workloads to that environment. I'll offer a quick but deep dive into how best to take advantage of ATP - including how to load data quickly into the underlying database – and some ideas on how ATP will impact the role of Oracle DBA in the immediate future. (Hint: Think automatic transmission instead of stick-shift.)
Users hate to wait - for anything. For our applications to be successful, they not only must be correct (meet user requirements) and maintainable. They must also execute efficiently enough to avoid user frustration. This presentation offers a whirlwind introduction to the most important techniques for improving PL/SQL performance, including data caching, FORALL and BULK COLLECT, the function result cache and the new 12.1 UDF pragma. It will help you proactively identify opportunities for applying techniques that will most dramatically (generally, an order or magnitude or more) improve the performance of your PL/SQL code. https://oracle.com/plsql
Clontab is an Oracle E-Business Suite automation tool that allows DBAs to schedule and forget routine tasks like refreshes, patching, and instance start/stop operations. It automates the full refresh process including pre and post refresh steps. DBAs can register their Oracle E-Business Suite source and target instances with Clontab once, then schedule refresh and patching tasks to run on a defined schedule. This allows DBAs to focus on more important tasks rather than manual routine work.
This is a summary of the sessions I attended at PASS Summit 2017. Out of the week-long conference, I put together these slides to summarize the conference and present at my company. The slides are about my favorite sessions that I found had the most value. The slides included screenshotted demos I personally developed and tested alike the speakers at the conference.
Чурюканов Вячеслав, “Code simple, but not simpler”EPAM Systems
Code simple, but not simpler discusses ways to develop enterprise applications based on "POJOs in Action" by Chris Richardson. Key messages include making everything as simple as possible while avoiding oversimplification, and that every technology has benefits and drawbacks. The document discusses domain models, transaction scripts, encapsulating business logic, accessing databases, handling concurrency, and long transactions. It compares object-relational mapping frameworks to other data access approaches and concurrency patterns like optimistic and pessimistic locking.
Open Source Library System Software: Libraries Are Doing it For Themselvesloriayre
This document discusses how libraries can get involved with open source library systems like Evergreen and Koha by contributing in various ways beyond just writing code. It outlines many ways libraries can participate such as organizing user communities, conducting user testing, writing documentation, managing projects, and more. It also provides resources for installing and getting support for Evergreen and Koha.
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Flink Forward
In this talk, I will present how Flink enables enterprise customers to unify their data processing systems by using Flink to query Hive data.
Unification of streaming and batch is a main theme for Flink. Since 1.9.0, we have integrated Flink with Hive in a platform level. I will talk about:
- what features we have released so far, and what they enable our customers to do
- best practices to use Flink with Hive
- what is the latest development status of Flink-Hive integration at the time of Flink Forward Berlin (Oct 2019), and what to look for in the next release (probably 1.11)
Similar to ETL with Clustered Columnstore - PASS Summit 2014 (20)
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
3. Explore Everything PASS Has to Offer
Free SQL Server and BI Web Events
Free 1-day Training Events
Regional Event
Local User Groups Around the World
Free Online Technical Training
This is Community
Business Analytics Training
Session Recordings
PASS Newsletter
4. Session Evaluations
ways to access
Go to
passsummit.com/evals
Download the GuideBook App
and search: PASS Summit 2014
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 11:59 PM EST
Friday Nov. 7 to
WIN prizes
Your feedback is
important and valuable.
Evaluation Deadline:
11:59 PM EST, Sunday Nov. 16
5. Niko Neugebauer
Microsoft Data Platform Professional
OH22(http://www.oh22.net)
SQL Server MVP
Founder & Co-Founder of 3 Portuguese PASS Chapters
In-Memory VC co-lead
Blog: http://www.nikoport.com
Twitter: @NikoNeugebauer
LinkedIn: http://pt.linkedin.com/in/webcaravela
Email: info@webcaravela.com
6. Oha! Since this is 400-level session:
And so I assume:
•You know the Columnstore Structure Elements, such as Row Groups, Delta-Stores, & Deleted Bitmaps.
•You do understand Segment Elimination.
•You have heard about Tuple Mover.
•You know what Columnstore Dictionary is, what types of dictionaries do exist and why.
•You know what a Batch Mode is.
8. General Thoughts:
Working with CCI requires a lot of resources, and so you might want/need to:
•Use Resource Governor –control memory grants and impact on CPU & IO. (You should consider capping the number of cores & memorygiven to Columnstore.)
•Use Partitioning–it helps you to control the impact.
9. ETL, ELT, LTE –My Today’s Agenda
•E–ExtractData from Columnstore
•T–Transform(Maintain)
•L–Load data into Columnstore
10. Extract
•Use Batch Processing, unless you prove that it does not help (You will need DOP>=2) Ω
•Avoid Delta Stores (Force Tuple Mover to close & compress them with Reorganize with (CLOSE_ALL_DELTA_STORES))Ω
12. Extraction with Segment Clustering
•Use Column Cluserting (You will need 1 to keep it perfect (ref: 2014 RTM + CU2)) Ω
•Use Partitioning:
it helps Column Clustering Rebuilds with DOP = 1 (Resource & impact optimization)
13. Extract continued
CCI Rebuild is a half-online, this means that you can read data but writing new info is not available.
•Use SELECT INTO, which became parallel execution plan in SQL Server 2014
14. Transform (Maintain)
•Do notrebuild CCI unless you really need it.
•Columnstore is not a Rowstore.
•What is the percentage of your
data that generally became deleted?
•What is the improvement that you are looking for ? Ω
15. Transform (Maintain)
Rebuild if you have a good window and need to realign column clustering.
If Rebuilding frequently, use partitioning –it helps you to minimize impact and control better
Create your own Columnstore Resource Group in Resource Governor Ω
16. Load
•Use Resource Governor
•Use Partitioning(especially for Switching In)
•Always use Batch Processing, unless proven that you don’t need it (You will need DOP>=2)
•Use BULK Load API (with more than 102.400 rows)
•Avoid using open Delta-Stores
•Use Parallel T-SQL (SELECT INTO)
18. Load Scenarios
Things to bear in mind:
Regarding Dictionaries:
•Global Dictionaries are created only on the first load/build and than they are never updated
•The Sampling is Max (1 Million Rows, 1%)
•Single Threaded
19. Load Scenarios
Global Dictionaries are not your Global Saviours:
•You need to have similar data in your column in order to get their usage
•You might actually get much more compact & efficient Local Dictionaries
22. Links:
My blog series on ColumnstoreIndexes (40+ Blogposts):
•http://www.nikoport.com/columnstore/
Remus RusanuIntroduction for Clustered Columnstore:
•http://rusanu.com/2013/06/11/sql-server-clustered- columnstore-indexes-at-teched-2013/
White Paper on the Clustered Columnstore:
•http://research.microsoft.com/pubs/193599/Apollo3%20- %20Sigmod%202013%20-%20final.pdf
23. Connect Items
Implement Batch Mode Support for Row Store:
•https://connect.microsoft.com/SQLServer/Feedback/Details/ 938021
Multi-threaded rebuilds of Clustered ColumnstoreIndexes break the sequence of pre-sorted segment ordering (Order Clustering):
•https://connect.microsoft.com/SQLServer/Feedback/Details/912452
ColumnstoreSegments Maintenance –Remove & Merge:
•https://connect.microsoft.com/SQLServer/Feedback/Details/930664