Innovation without permission: from Codd to NoSQL

•Download as KEY, PDF•

2 likes•700 views

Practitioners often fail to apply textbook database design principles. We observe both a perversion of the relational model and a growth of less formal alternatives. Overall, there is an opposition between the analytic thought that prevailed when many data modeling techniques were initiated, and the pragmatism which now dominates among practitioners. There are at least two recent trends supporting this rejection of traditional models: (1) the rise of the sophisticated user, most notably in social media is challenge to the rationalist view, as it blurs the distinction between design and operation, (2) in the new technological landscape where there are billions of interconnected computers worldwide, simple concepts like consistency sometimes become prohibitively expensive. Overall, for a wide range of information systems, design and operation are becoming integrated in the spirit of pragmatism. Thus, we are left with design methodologies which embrace fast and continual iterations and and exploratory testing. These methodologies allow innovation without permission in that the right to design new features is no longer so closely guarded. Fo

Technology

Innovation without
permission

Daniel Lemire
http://lemire.me http://twitter.com/lemire

Thanks to: A. Badia, Louisville University and
J. Robillard from UQAM

- 2000 employees
- 600 million users*

* As of January 2011

Agarwal, A. (2009). Facebook: Science and the Social Graph. QCon 2008.

- No schema : key-value stores
- No join
- Engineers have direct access to data

* As of January 2011

Agarwal, A. (2009). Facebook: Science and the Social Graph. QCon 2008.

~10 000 Information Systems
~90% Relational
100-200 Tables/database
50-200 Attributes/table

Source: Brodie & Liu, The Power and Limits of Relational Technology in the Age of
Information Ecosystems, On The Move Federated Conferences, 2010.

Post-Methodology Era:
late 1990s
D. E. Avison and G. Fitzgerald, Where now for development methodologies? 2003.

Sophisticated users
Image source: Dave77459
(flickr)

Billions of computers
Image source: ivanx
(flickr)

Users are considered as mere
faceless objects for who the
systems are designed.

J. Iivari, H. Isomäki, S. Pekkola, The user – the great unknown of systems development:
reasons, forms, challenges, experiences and intellectual contributions of user involvement,
Information Systems Journal, 2010.

93% of accounts are never used
Source: Meredith and O'Donnell, A Functional Model of Social Media and its Application
to Business Intelligence, DSS '10, 2010.

n !ot
s are never used
a ted
w ul
I s
93% of accounts

c o n
Source: Meredith and O'Donnell, A Functional Model of Social Media and its Application
to Business Intelligence, DSS '10, 2010.

Deployment: test for user reactions

* As of January 2011

Agarwal, A. (2009). Facebook: Science and the Social Graph. QCon 2008.

- Google had more than
1 million servers* in
2007
* according to Gartner

Brewer’s theorem (CAP)

Consistency Availability
XN
B MS oS
QL
RD

Tolerance

Gilbert, S. and Lynch, N., Brewer's conjecture and the feasibility of
consistent, available, partition-tolerant web services. 2002

- Corruption in Oracle database
- Up to 16.5 million customers affected
- $132 million frozen
- thousands of loan applications lost

- Over-engineered database: strong
consistency throughout

Online: Chris Mellor, Morgan Chase blames Oracle for online bank crash ,
Curt Monash, Details of the JPMorgan Chase Oracle database outage

Does your
methodology
know about:

- Co-design with users?
- Highly distributed data?

Better bitmap performance with Roaring bitmaps Bitmaps are used to implement fast set operations in software. They are frequently found in databases and search engines. Without compression, bitmaps scale poorly, so they are often compressed. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). For example, Oracle relies on BBC bitmap compression while the version control system Git and Apache Hive rely on EWAH compression. We can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has been adopted by several production platforms (e.g., Apache Lucene/Solr/Elastic, Apache Spark, eBay's Apache Kylin and Metamarkets' Druid). Overall, our implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE-based alternatives (WAH, Concise, EWAH) while compressing better. We review the design choices and optimizations that make these good results possible.

Depuis la mise en marché du Pentium 4, nos processeurs bénéficient d'instructions vectorielles. En tenant compte explicitement de ces instructions dans la conception de nos algorithmes, nous pouvons grandement accélérer les calculs. À titre d'exemple, considérons la compression des listes d'entiers telle qu'elle s'effectue au sein de la plupart des moteurs de recherche ou des bases de données. En cette matière, nous utilisons souvent encore des algorithmes développés dans les années 70. Nous expliquerons comment on peut faire beaucoup mieux en ce qui a trait à la vitesse en exploitant les instructions vectorielles.

Decoding billions of integers per second through vectorization

Daniel Lemire

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression rate within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.

Extracting, Transforming and Archiving Scientific Data

Daniel Lemire

It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.

MaskedVByte: SIMD-accelerated VByte

Daniel Lemire

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed. Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/pdf/1503.07387.pdf

Roaring Bitmap : June 2015 report

Daniel Lemire

Write good papers

Daniel Lemire

Engineering fast indexes

Daniel Lemire

Engineering fast indexes (Deepdive)

Daniel Lemire

Summer Social Webshop: Technology-Mediated Social Participation

University of Maryland

IoT Day 2014 - Results and challenges ahead for IoT

Raffaele Giaffreda

Advanced Social Media Techniques in Higher Education

Christopher Rice

Scholarship in the Digital World

David De Roure

Seminar at CSAIL, MIT, Cambridge, Mass. Date: Friday October 30, 2015. Time: 4:00 pm - 5:00 pm, Location: D463 (Star) Abstract: Today we are witnessing several shifts in scholarly practice, in and across multiple disciplines, as researchers embrace digital techniques to tackle established research questions in new ways and new questions afforded by digital and digitized collections, approaches, and technologies. Pervasive adoption of technology, coupled with the co-creation of new social processes, has created a new and complex space for scholarship where citizens both generate and analyse data as they interact at the intersection of the physical and digital. Drawing on a background in distributed computing, and adopting the lens of Social Machines, this talk discusses current activity in digital scholarship, framing it in its interdisciplinary settings. Bio: David De Roure is Professor of e-Research at University of Oxford, Director of the Oxford e-Research Centre, and chairs Oxford’s Digital Humanities research programme. He previously directed the Digital Social Research programme for the UK Economic and Social Research Council, and serves as a strategic advisor in new forms of data and realtime analytics. Trained in electronics and computer science, his career has involved interdisciplinary collaborations in chemistry, astrophysics, bioinformatics, social computing, digital libraries, and sensor networks. His personal research is in Computational Musicology, Web Science, and Internet of Things. He is a frequent speaker and writer on digital research and the future of scholarly communications. URL: http://www.oerc.ox.ac.uk/people/dder

Webometrics Revisited in Big Data Age_DISC2013

Han Woo PARK

Social Media and Student Learning: Using Analytics to Visualise Twitter Commu...

sharstoer

The New e-Science (Bangalore Edition)

David De Roure

Mobile, Social, Global: Applications of Emerging Technologies in Survey Reseach

AdamSage

Big data privacy issues in public social media

Supriya Radhakrishna

20220203 jim spohrer uidp v11

ISSIP

Social databases - A brief overview

Iván Sanchez Vera

Ullmannanesah

Human Being Character Analysis from Their Social Networking Profiles

Biswaranjan Samal

In this paper, characteristics of human beings obtained from profile statement present in their social networking profile status are analyzed in terms of introvert, extrovert or ambivert. Recently, Machine learning plays a vital role in classifying the human characteristics. The user profile status is collected from LinkedIn, a popular professional social networking application. Oauth2.0 protocol is used for login into the LinkedIn and web scrapping using JavaScript is used for information extraction of the registered users. Then, Word Net: a lexical database is used for forming the word clusters such as: extrovert and introvert using semi-supervised learning techniques. K-nearest neighbor classification algorithm is finally considered for classifying the profiles into various available categories. The results obtained in the proposed method are encouraging with good accuracy

The End(s) of e-Research

Eric Meyer

Viewers also liked

La vectorisation des algorithmes de compression

Daniel Lemire

Decoding billions of integers per second through vectorization

Daniel Lemire

Extracting, Transforming and Archiving Scientific Data

Daniel Lemire

MaskedVByte: SIMD-accelerated VByte

Daniel Lemire

Roaring Bitmap : June 2015 report

Daniel Lemire

Write good papers

Daniel Lemire

Engineering fast indexes

Daniel Lemire

Engineering fast indexes (Deepdive)

Daniel Lemire

Viewers also liked (8)

La vectorisation des algorithmes de compression

Decoding billions of integers per second through vectorization

Extracting, Transforming and Archiving Scientific Data

MaskedVByte: SIMD-accelerated VByte

Roaring Bitmap : June 2015 report

Write good papers

Engineering fast indexes

Engineering fast indexes (Deepdive)

Similar to Innovation without permission: from Codd to NoSQL

Summer Social Webshop: Technology-Mediated Social Participation

University of Maryland

IoT Day 2014 - Results and challenges ahead for IoT

Raffaele Giaffreda

Advanced Social Media Techniques in Higher Education

Christopher Rice

Scholarship in the Digital World

David De Roure

Webometrics Revisited in Big Data Age_DISC2013

Han Woo PARK

Social Media and Student Learning: Using Analytics to Visualise Twitter Commu...

sharstoer

The New e-Science (Bangalore Edition)

David De Roure

Mobile, Social, Global: Applications of Emerging Technologies in Survey Reseach

AdamSage

Big data privacy issues in public social media

Supriya Radhakrishna

20220203 jim spohrer uidp v11

ISSIP

Social databases - A brief overview

Iván Sanchez Vera

Ullmannanesah

Human Being Character Analysis from Their Social Networking Profiles

Biswaranjan Samal

The End(s) of e-Research

Eric Meyer

The evolution of research on social media

Farida Vis

Viva presentation Ashley J Wheat

Past, Present and Research Challenge in Adaptive User InterfacesEduardo Castillejo Gil

London Futurists - The Future of AI & Sustainability

Alex Housley

Artificial intelligence (AI) is powering the fourth industrial revolution. Intelligent machines are tackling new cognitive tasks at scale, leading to enormous economic efficiency gains and disruption across the labour market. But what will be the net impact of AI on society and the ecological environment? In this talk, Alex Housley, founder and CEO of open-source machine learning platform Seldon, explains how the collaborative approach to AI development helps transform industries and provides the macro-scale opportunities for AI to make the world a better and more sustainable place. The event was chaired by David Wood. The camera was operated by Kiran Manam. For more details about this event, see https://www.meetup.com/London-Futuris.... For more information about Seldon, see https://www.seldon.io/. To apply to join the closed beta mentioned in the talk, visit bit.ly/deploy-beta.

CSE509 Lecture 5

Web Science Research Group at Institute of Business Administration, Karachi, Pakistan

Augmented Reality as A Pervasive Technology: Context-Aware ApproachAditya Yudiantika

Similar to Innovation without permission: from Codd to NoSQL (20)

Summer Social Webshop: Technology-Mediated Social Participation

IoT Day 2014 - Results and challenges ahead for IoT

Advanced Social Media Techniques in Higher Education

Scholarship in the Digital World

Webometrics Revisited in Big Data Age_DISC2013

Social Media and Student Learning: Using Analytics to Visualise Twitter Commu...

The New e-Science (Bangalore Edition)

Mobile, Social, Global: Applications of Emerging Technologies in Survey Reseach

Big data privacy issues in public social media

20220203 jim spohrer uidp v11

Social databases - A brief overview

Ullmann

Human Being Character Analysis from Their Social Networking Profiles

The End(s) of e-Research

The evolution of research on social media

Viva presentation

Past, Present and Research Challenge in Adaptive User Interfaces

London Futurists - The Future of AI & Sustainability

CSE509 Lecture 5

Augmented Reality as A Pervasive Technology: Context-Aware Approach

More from Daniel Lemire

Accurate and efficient software microbenchmarks

Daniel Lemire

Software is often improved incrementally. Each software optimization should be assessed with microbenchmarks. In a microbenchmark, we record performance measures such as elapsed time or instruction counts during specific tasks, often in idealized conditions. In principle, the process is easy: if the new code is faster, we adopt it. Unfortunately, there are many pitfalls, such as unrealistic statistical assumptions and poorly designed benchmarks. Abstractions like cloud computing add further challenges. We illustrate effective benchmarking practices with examples.

Fast indexes with roaring #gomtl-10

Daniel Lemire

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary). Roaring bitmaps are a standard indexing data structure. They are widely used in search and database engines. For example, Lucene, the search engine powering Wikipedia relies on Roaring. The Go library roaring implements Roaring bitmaps in Go. It is used in several popular systems such as InfluxDB, Pilosa and Bleve. This library is used in production in several systems, it is part of the Awesome Go collection. After presenting the library, we will cover some advanced Go topics such as the use of assembly language, unsafe mappings, and so forth.

Parsing JSON Really Quickly: Lessons Learned

Daniel Lemire

Our disks and networks can load gigabytes of data per second; we feel strongly that our software should follow suit. Thus we wrote what might be the fastest JSON parser in the world, simdjson. It can parse typical JSON files at speeds of over 2 GB/s on single commodity Intel core with full validation; it is several times faster than conventional parsers. How did we go so fast? We started with the insight that we should make full use of the SIMD instructions available on commodity processors. These instructions are everywhere, from the ARM chip in your smartphone all to way to server processors. SIMD instructions work on wide registers (e.g., spanning 32 bytes): they are faster because they process more data using fewer instructions. To our knowledge, nobody had ever attempted to produce a full parser for something as complex as JSON by relying primarily on SIMD instructions. And many people were skeptical that a full parser could be done fruitfully with SIMD instructions. We had to develop interesting new strategies that are generally applicable. In the end, we learned several lessons. Maybe one of the most important lesson is the importance of a nearly obsessive focus on performance metrics. We constantly measure the impact of the choices we make.

Next Generation Indexes For Big Data Engineering (ODSC East 2018)

Daniel Lemire

Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions. We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core. The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed: 1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth. 2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.

Ingénierie de la performance au sein des mégadonnées

Daniel Lemire

Les index logiciels accélèrent les applications en intelligence d'affaire, en apprentissage machine et en science des données. Ils déterminent souvent la performance des applications portant sur les mégadonnées. Les index efficaces améliorent non seulement la latence et le débit, mais aussi la consommation d'énergie. Plusieurs index font une utilisation parcimonieuse de la mémoire vive afin que les données critiques demeurent près du processeur. Il est aussi souhaitable de travailler directement sur les données compressées afin d'éviter une étape de décodage supplémentaire. (1) Nous nous intéressons aux index bitmap. Nous les trouvons dans une vaste gamme de systèmes : Oracle, Hive, Spark, Druid, Kylin, Lucene, Elastic, Git... Ils sont une composante de systèmes, tels que Wikipedia ou GitHub, dont dépendent des millions d'utilisateurs à tous les jours. Nous présenterons certains progrès récents ayant trait à l'optimisation des index bitmap, tels qu'ils sont utilisés au sein des systèmes actuels. Nous montrons par des exemples comment multiplier la performance de ces index dans certains cas sur les processeurs bénéficiant d'instructions SIMD (instruction unique, données multiples) avancées. (2) Nous ciblons aussi les listes d'entiers que l'on trouve au sein des arbres B+, dans les indexes inversés et les index bitmap compressés. Nous donnons un exemple récent de technique de compression (Stream VByte) d’entiers qui permet de décoder des milliards d’entiers compressés par seconde.

SIMD Compression and the Intersection of Sorted Integers

Daniel Lemire

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decoding speed can be wasted. To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm. We exploit the fact that one SIMD instruction can compare 4 pairs of integers at once. We experiment with two TREC text collections, GOV2 and ClueWeb09 (Category B), using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach.

Decoding billions of integers per second through vectorization

Daniel Lemire

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.

Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...

Daniel Lemire

Nowadays, medical image compression is an essential process in eHealth systems. Compressing medical images in high quality is a vital demand to avoid misdiagnosing medical exams by radiologists. WAAVES is a promising medical images compression algorithm based on the discrete wavelet transform (DWT) that achieves a high compression performance compared to the state of the art. The main aims of this work are to enhance image quality when compressing using WAAVES and to provide a high-speed DWT architecture for image compression on embedded systems. Regarding the quality improvement, the logarithmic number systems (LNS) was explored to be used as an alternative to the linear arithmetic in DWT computations. A new LNS library was developed and validated to realize the logarithmic DWT. In addition, a new quantization method called (LNS-Q) based on logarithmic arithmetic was proposed. A novel compression scheme (LNS-WAAVES) based on integrating the Hybrid-DWT and the LNS-Q method with WAAVES was developed. Hybrid-DWT combines the advantages of both the logarithmic and the linear domains leading to enhancement of the image quality and the compression ratio. The results showed that LNS-WAAVES is able to achieve an improvement in the quality by a percentage of 8% and up to 34% compared to WAAVES depending on the compression configuration parameters and the image modalities.

Faster Column-Oriented Indexes

Daniel Lemire

Compressing column-oriented indexes

Daniel Lemire

Column-oriented databases have become fashionable following the work of Stonebraker et al. In the data warehousing industry, the terms "column oriented" and "column store" have become necessary marketing buzzwords. One of the benefits of column-oriented indexes is good compression through run-length encoding (RLE). This type of compression is particularly benefitial since it simultaneously reduce the volume of data and the necessary computations. However, the efficiency of the compression depends on the order of the rows in the table and this is even more important with larger tables. Finding the best row ordering is NP hard. We compare some heuristics for this problem including variations on the lexicographical order, Gray codes, and Hilbert space-filling curves.

All About Bitmap Indexes... And Sorting Them

Daniel Lemire

A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP

Daniel Lemire

A data warehouse cannot materialize all possible views, hence we must estimate quickly, accurately, and reliably the size of views to determine the best candidates for materialization. Many available techniques for view-size estimation make particular statistical assumptions and their error can be large. Comparatively, unassuming probabilistic techniques are slower, but they estimate accurately and reliability very large view sizes using little memory. We compare five unassuming hashing-based view-size estimation techniques including Stochastic Probabilistic Counting and LogLog Probabilistic Counting. Our experiments show that only Generalized Counting, Gibbons-Tirthapura, and Adaptive Counting provide universally tight estimates irrespective of the size of the view; of those, only Adaptive Counting remains constantly fast as we increase the memory budget.

Tag-Cloud Drawing: Algorithms for Cloud Visualization

Daniel Lemire

Tag clouds provide an aggregate of tag-usage statistics. They are typically sent as in-line HTML to browsers. However, display mechanisms suited for ordinary text are not ideal for tags, because font sizes may vary widely on a line. As well, the typical layout does not account for relationships that may be known between tags. This paper presents models and algorithms to improve the display of tag clouds that con- sist of in-line HTML, as well as algorithms that use nested tables to achieve a more general 2-dimensional layout in which tag relationships are considered. The first algorithms leverage prior work in typesetting and rectangle packing, whereas the second group of algorithms leverage prior work in Electronic Design Automation. Experiments show our algorithms can be efficiently implemented and perform well.

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Daniel Lemire

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.

More from Daniel Lemire (14)

Accurate and efficient software microbenchmarks

Fast indexes with roaring #gomtl-10

Parsing JSON Really Quickly: Lessons Learned

Next Generation Indexes For Big Data Engineering (ODSC East 2018)

Ingénierie de la performance au sein des mégadonnées

SIMD Compression and the Intersection of Sorted Integers

Decoding billions of integers per second through vectorization

Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...

Faster Column-Oriented Indexes

Compressing column-oriented indexes

All About Bitmap Indexes... And Sorting Them

A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP

Tag-Cloud Drawing: Algorithms for Cloud Visualization

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Knowledge engineering: from people to machines and back

Elena Simperl

Bits & Pixels using AI for Good.........

Alison B. Lowndes

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Recently uploaded (20)