This presentation was presented by Martin Kersten (CWI), well known in the Dutch eScience and scientific computing community, at the Netherlands eScience Center (NLeSC) on November 9, 2011 in Amsterdam, Netherlands.
Abstract of the presentation:
This presentation gives an introduction to NoSQL (Not only SQL) (pdf) databases with examples from MonetDB and discussed, applications and limitations.
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
Spark is an open source cluster computing framework that can outperform Hadoop by 30x through a combination of in-memory computation and a richer execution engine. Shark is a port of Apache Hive onto Spark, which provides a similar speedup for SQL queries, allowing interactive exploration of data in existing Hive warehouses. This talk will cover how both Spark and Shark are being used at various companies to accelerate big data analytics, the architecture of the systems, and where they are heading. We will also discuss the next major feature we are developing, Spark Streaming, which adds support for low-latency stream processing to Spark, giving users a unified interface for batch and real-time analytics.
In this talk, we present two emerging, popular open source projects: Spark and Shark. Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. It outperform Hadoop by up to 100x in many real-world applications. Spark programs are often much shorter than their MapReduce counterparts thanks to its high-level APIs and language integration in Java, Scala, and Python. Shark is an analytic query engine built on top of Spark that is compatible with Hive. It can run Hive queries much faster in existing Hive warehouses without modifications.
These systems have been adopted by many organizations large and small (e.g. Yahoo, Intel, Adobe, Alibaba, Tencent) to implement data intensive applications such as ETL, interactive SQL, and machine learning.
JCConf 2018 - Retrospect and Prospect of JavaJoseph Kuo
It has been more than 2 decades since the first version of Java was released in 1996. As of today, Java has been applied in many different fields form large-scale distributed computing services with scalability and stability, to millions of various APPs installed in mobile devices/cellphones/cars all over the world. At this time when Java 11 is being ready to introduce more new enhancements and deprecate legacy libraries, let us retrospect the past history of Java from the beginning, focus on recent significant changes from Java 8 to 10, prospect new features included in Java 11, and speculate what functionalities may come out in the future.
https://cyberjos.blog/java/seminar/jcconf-2018-retrospect-and-prospect-of-java/
Apache Spark: The Analytics Operating SystemAdarsh Pannu
This presentation was delivered by Adarsh Pannu at IBM's Insight Conference in Nov 2015. For a recording, visit: https://www.youtube.com/watch?v=Tbm7HIlmwJQ
The presentation provides an overview of Apache Spark, a general-purpose big data processing engine built around speed, ease of use and sophisticated analytics. It enumerates the benefits of incorporating Spark in the enterprise, including how it allows developers to write fully-featured distributed applications ranging from traditional data processing pipelines to complex machine learning. The presentation uses the Airline "On Time" data set to explore various components of the Spark stack.
From Java code to Java heap: Understanding and optimizing your application's ...Chris Bailey
This presentation gives you insight into the memory usage of Java™ code, covering the memory overhead of putting an int value into an Integer object, the cost of object delegation, and the memory efficiency of the different collection types. You'll learn how to determine where inefficiencies occur in your application and how to choose the right collections to improve your code.
You can read an article relating to the slides here:
http://www.ibm.com/developerworks/java/library/j-codetoheap/index.html
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
Spark is an open source cluster computing framework that can outperform Hadoop by 30x through a combination of in-memory computation and a richer execution engine. Shark is a port of Apache Hive onto Spark, which provides a similar speedup for SQL queries, allowing interactive exploration of data in existing Hive warehouses. This talk will cover how both Spark and Shark are being used at various companies to accelerate big data analytics, the architecture of the systems, and where they are heading. We will also discuss the next major feature we are developing, Spark Streaming, which adds support for low-latency stream processing to Spark, giving users a unified interface for batch and real-time analytics.
In this talk, we present two emerging, popular open source projects: Spark and Shark. Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. It outperform Hadoop by up to 100x in many real-world applications. Spark programs are often much shorter than their MapReduce counterparts thanks to its high-level APIs and language integration in Java, Scala, and Python. Shark is an analytic query engine built on top of Spark that is compatible with Hive. It can run Hive queries much faster in existing Hive warehouses without modifications.
These systems have been adopted by many organizations large and small (e.g. Yahoo, Intel, Adobe, Alibaba, Tencent) to implement data intensive applications such as ETL, interactive SQL, and machine learning.
JCConf 2018 - Retrospect and Prospect of JavaJoseph Kuo
It has been more than 2 decades since the first version of Java was released in 1996. As of today, Java has been applied in many different fields form large-scale distributed computing services with scalability and stability, to millions of various APPs installed in mobile devices/cellphones/cars all over the world. At this time when Java 11 is being ready to introduce more new enhancements and deprecate legacy libraries, let us retrospect the past history of Java from the beginning, focus on recent significant changes from Java 8 to 10, prospect new features included in Java 11, and speculate what functionalities may come out in the future.
https://cyberjos.blog/java/seminar/jcconf-2018-retrospect-and-prospect-of-java/
Apache Spark: The Analytics Operating SystemAdarsh Pannu
This presentation was delivered by Adarsh Pannu at IBM's Insight Conference in Nov 2015. For a recording, visit: https://www.youtube.com/watch?v=Tbm7HIlmwJQ
The presentation provides an overview of Apache Spark, a general-purpose big data processing engine built around speed, ease of use and sophisticated analytics. It enumerates the benefits of incorporating Spark in the enterprise, including how it allows developers to write fully-featured distributed applications ranging from traditional data processing pipelines to complex machine learning. The presentation uses the Airline "On Time" data set to explore various components of the Spark stack.
From Java code to Java heap: Understanding and optimizing your application's ...Chris Bailey
This presentation gives you insight into the memory usage of Java™ code, covering the memory overhead of putting an int value into an Integer object, the cost of object delegation, and the memory efficiency of the different collection types. You'll learn how to determine where inefficiencies occur in your application and how to choose the right collections to improve your code.
You can read an article relating to the slides here:
http://www.ibm.com/developerworks/java/library/j-codetoheap/index.html
Presentation slides for the paper on Resilient Distributed Datasets, written by Matei Zaharia et al. at the University of California, Berkeley.
The paper is not my work.
These slides were made for the course on Advanced, Distributed Systems held by prof. Bratsberg at NTNU (Norwegian University of Science and Technology, Trondheim, Norway).
Spark Training Institutes: kelly technologies is the best Spark class Room training institutes in Bangalore. Providing Spark training by real time faculty in Bangalore.
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
Hive Query Language (HQL) is excellent for productivity and enables reuse of SQL skills, but falls short in advanced analytic queries. Hive`s Map & Reduce scripts mechanism lacks the simplicity of SQL and specifying new analysis is cumbersome. We developed SQLWindowing for Hive(SQW) to overcome these issues. SQW introduces both Windowing and Table Functions to the Hive user. SQW appears as a HQL extension with table functions and windowing clauses interspersed with HQL. This means the user stays within a SQL-like interface, while simultaneously having these capabilities available. SQW has been published as an open source project. It is available as both a CLI and an embeddable jar with a simple query API. There are pre-built functions for windowing to do Ranking, Aggregation, Navigation and Linear Regression. There are Table functions to do Time Series Analysis, Allocations, and Data Densification. Functions can be chained for more complex analysis. Under the covers MR mechanics are used to partition and order data. The fundamental interface is the tableFunction, whose core job is to operate on data partitions. Function implemenations are isolated from MR mechanics, focus purely on computation logic. Groovy scripting can be used for core implementation and parameterizing behavior. Writing functions typically involves extending one of the existing Abstract functions.
Stockage de données dans les SGBD
Cette présentation traite des diverses manières de stocker der informations dans les bases de donées ainsi que des approches techniques permettant d'optimiser le traitement de ces données tout en consommant le moins de ressources possibles
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...Hatim CHAHDI
Ce cours introduit les bases de données orientées colonnes et leurs spécificités. Il détaille par la suite l'architecture d'HBase et explique les moyens nécessaires à sa mise en place et à son exploitation.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
SQL Server 2014 In-Memory Tables (XTP, Hekaton)Tony Rogerson
Semi-advanced presentation on SQL Server 2014 in-memory tables which is part of the Extreme Transaction Processing feature (project: Hekaton).
Deck and demo can be found: http://sdrv.ms/1dvWouN
Presentation slides for the paper on Resilient Distributed Datasets, written by Matei Zaharia et al. at the University of California, Berkeley.
The paper is not my work.
These slides were made for the course on Advanced, Distributed Systems held by prof. Bratsberg at NTNU (Norwegian University of Science and Technology, Trondheim, Norway).
Spark Training Institutes: kelly technologies is the best Spark class Room training institutes in Bangalore. Providing Spark training by real time faculty in Bangalore.
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
Hive Query Language (HQL) is excellent for productivity and enables reuse of SQL skills, but falls short in advanced analytic queries. Hive`s Map & Reduce scripts mechanism lacks the simplicity of SQL and specifying new analysis is cumbersome. We developed SQLWindowing for Hive(SQW) to overcome these issues. SQW introduces both Windowing and Table Functions to the Hive user. SQW appears as a HQL extension with table functions and windowing clauses interspersed with HQL. This means the user stays within a SQL-like interface, while simultaneously having these capabilities available. SQW has been published as an open source project. It is available as both a CLI and an embeddable jar with a simple query API. There are pre-built functions for windowing to do Ranking, Aggregation, Navigation and Linear Regression. There are Table functions to do Time Series Analysis, Allocations, and Data Densification. Functions can be chained for more complex analysis. Under the covers MR mechanics are used to partition and order data. The fundamental interface is the tableFunction, whose core job is to operate on data partitions. Function implemenations are isolated from MR mechanics, focus purely on computation logic. Groovy scripting can be used for core implementation and parameterizing behavior. Writing functions typically involves extending one of the existing Abstract functions.
Stockage de données dans les SGBD
Cette présentation traite des diverses manières de stocker der informations dans les bases de donées ainsi que des approches techniques permettant d'optimiser le traitement de ces données tout en consommant le moins de ressources possibles
Cours HBase et Base de Données Orientées Colonnes (HBase, Column Oriented Dat...Hatim CHAHDI
Ce cours introduit les bases de données orientées colonnes et leurs spécificités. Il détaille par la suite l'architecture d'HBase et explique les moyens nécessaires à sa mise en place et à son exploitation.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
SQL Server 2014 In-Memory Tables (XTP, Hekaton)Tony Rogerson
Semi-advanced presentation on SQL Server 2014 in-memory tables which is part of the Extreme Transaction Processing feature (project: Hekaton).
Deck and demo can be found: http://sdrv.ms/1dvWouN
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly.
This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin.
More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that.
At the end the goal is to gather reference points to look at, whenever you are faced with performance problems.
Take the chance to close your knowledge gaps and learn how to get the most out of your system.
OSDC 2017 | Open POWER for the data center by Werner FischerNETWAYS
IBM's POWER (Performance Optimization With Enhanced RISC) architecture is known to run mission-critical applications and to provide bank-style "RAS" (Reliability, Availability, Serviceability) features since 1990. Opening the architecture in 2013 enabled other vendors like Tyan or Rackspace to build servers based on the current POWER8 edition of this architecture. The current POWER8 CPUs provide up to 12 cores with 8x Simultaneous Multithreading - leading to 96 threads per CPU. Up to eight memory channels enable up to 230 GB/s memory bandwidth per CPU. Increased L1, L2, L3 and new L4 caches help to boost the performance of memory-bound applications like databeses, by providing more than 1 TB/s of bandwidth. In this talk Werner will give an overview of the architecture and show the performance possibilities of POWER8, using the PostgreSQL database as an example. By comparing PostgreSQL 9.4, 9.5 and 9.6 benchmarking results he will visualize the increased efficiency thanks to PowergreSQL's optimizations for POWER over the last years. Finally, he will outline one other benefit of OpenPOWER systems: from the very beginning (the first instruction to initialize the first CPU core, long before DRAM, firmware management or PCIe works) up to running your Linux OS and application like a database, only open source code gets executed.
OSDC 2017 - Werner Fischer - Open power for the data centerNETWAYS
IBM's POWER (Performance Optimization With Enhanced RISC) architecture is known to run mission-critical applications and to provide bank-style "RAS" (Reliability, Availability, Serviceability) features since 1990. Opening the architecture in 2013 enabled other vendors like Tyan or Rackspace to build servers based on the current POWER8 edition of this architecture. The current POWER8 CPUs provide up to 12 cores with 8x Simultaneous Multithreading - leading to 96 threads per CPU. Up to eight memory channels enable up to 230 GB/s memory bandwidth per CPU. Increased L1, L2, L3 and new L4 caches help to boost the performance of memory-bound applications like databeses, by providing more than 1 TB/s of bandwidth. In this talk Werner will give an overview of the architecture and show the performance possibilities of POWER8, using the PostgreSQL database as an example. By comparing PostgreSQL 9.4, 9.5 and 9.6 benchmarking results he will visualize the increased efficiency thanks to PowergreSQL's optimizations for POWER over the last years. Finally, he will outline one other benefit of OpenPOWER systems: from the very beginning (the first instruction to initialize the first CPU core, long before DRAM, firmware management or PCIe works) up to running your Linux OS and application like a database, only open source code gets executed.
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
In this session, the speakers will discuss their experiences porting Apache Spark to the Cray XC family of supercomputers. One scalability bottleneck is in handling the global file system present in all large-scale HPC installations. Using two techniques (file open pooling, and mounting the Spark file hierarchy in a specific manner), they were able to improve scalability from O(100) cores to O(10,000) cores. This is the first result at such a large scale on HPC systems, and it had a transformative impact on research, enabling their colleagues to run on 50,000 cores.
With this baseline performance fixed, they will then discuss the impact of the storage hierarchy and of the network on Spark performance. They will contrast a Cray system with two levels of storage with a “data intensive” system with fast local SSDs. The Cray contains a back-end global file system and a mid-tier fast SSD storage. One conclusion is that local SSDs are not needed for good performance on a very broad workload, including spark-perf, TeraSort, genomics, etc.
They will also provide a detailed analysis of the impact of latency of file and network I/O operations on Spark scalability. This analysis is very useful to both system procurements and Spark core developers. By examining the mean/median value in conjunction with variability, one can infer the expected scalability on a given system. For example, the Cray mid-tier storage has been marketed as the magic bullet for data intensive applications. Initially, it did improve scalability and end-to-end performance. After understanding and eliminating variability in I/O operations, they were able to outperform any configurations involving mid-tier storage by using the back-end file system directly. They will also discuss the impact of network performance and contrast results on the Cray Aries HPC network with results on InfiniBand.
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
In this session, the speakers will discuss their experiences porting Apache Spark to the Cray XC family of supercomputers. One scalability bottleneck is in handling the global file system present in all large-scale HPC installations. Using two techniques (file open pooling, and mounting the Spark file hierarchy in a specific manner), they were able to improve scalability from O(100) cores to O(10,000) cores. This is the first result at such a large scale on HPC systems, and it had a transformative impact on research, enabling their colleagues to run on 50,000 cores.
With this baseline performance fixed, they will then discuss the impact of the storage hierarchy and of the network on Spark performance. They will contrast a Cray system with two levels of storage with a “data intensive” system with fast local SSDs. The Cray contains a back-end global file system and a mid-tier fast SSD storage. One conclusion is that local SSDs are not needed for good performance on a very broad workload, including spark-perf, TeraSort, genomics, etc.
They will also provide a detailed analysis of the impact of latency of file and network I/O operations on Spark scalability. This analysis is very useful to both system procurements and Spark core developers. By examining the mean/median value in conjunction with variability, one can infer the expected scalability on a given system. For example, the Cray mid-tier storage has been marketed as the magic bullet for data intensive applications. Initially, it did improve scalability and end-to-end performance. After understanding and eliminating variability in I/O operations, they were able to outperform any configurations involving mid-tier storage by using the back-end file system directly. They will also discuss the impact of network performance and contrast results on the Cray Aries HPC network with results on InfiniBand.
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
Check out these exercises: http://de.slideshare.net/nicolayludwig/8-cpp-stack-automaticmemoryandstaticmemory-38510742
- Introducing CPU Registers
- Function Stack Frames and the Decrementing Stack
- Function Call Stacks, the Stack Pointer and the Base Pointer
- C/C++ Calling Conventions
- Stack Overflow, Underflow and Channelling incl. Examples
- How variable Argument Lists work with the Stack
- Static versus automatic Storage Classes
- The static Storage Class and the Data Segment
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
This slides introduce a python toolkit for Natural Language Processing (NLP). The author introduces several useful topics in NLTK and demonstrates with code examples.
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
Find out about breakthrough architectures for fast OLAP performance querying Cassandra data with Apache Spark, including a new open source project, FiloDB.
Similar to Arrays in database systems, the next frontier? (20)
Authros: Nguyen Quoc Viet Hung (1), Nguyen Thanh Tam (1), Zoltán Miklós (2), Karl Aberer (1),
Avigdor Gal (3), and Matthias Weidlich (4)
1 École Polytechnique Fédérale de Lausanne
2 Université de Rennes 1
3 Technion – Israel Institute of Technology
4 Imperial College London
by Irene Celino, Simone Contessa, Marta Corubolo, Daniele Dell’Aglio, Emanuele Della Valle, Stefano Fumeo and Thorsten Krüger
CEFRIEL – Politecnico di Milano – SIEMENS
by G. Larkou, J. Metochi, G. Chatzimilioudis and D. Zeinalipour-Yazti
Presented at: 1st IEEE International Workshop on Mobile Data Management Mining and Computing on Social Networks, collocated with IEEE MDM'13
The presentation was delivered by FORTH at the 3rd International Workshop on the role of Semantic Web in Provenance Management 2012 (SWPM2012) in Heraklion, Greece on 28th of May 2012.
Abstract:
Workflow systems can produce very large amounts of provenance information. In this paper we introduce provenance-based inference rules as a means to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). We motivate this kind of (provenance) inference and identify a number of basic inference rules over a conceptual model appropriate for representing provenance. The proposed inference rules concern the interplay between (i) actors and carried out activities, (ii) activities and devices that were used for such activities, and, (iii) the presence of information objects and physical things at events. However, since a knowledge base is not static but it changes over time for various reasons, we also study how we can satisfy change requests while supporting and respecting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required change operations.
This paper was presented by Vassilis Papakonstantinou at the 17th ACM Symposium on Access Control Models and Technologies (ACM SACMAT 2012) in Newark, USA, June 20 - 22, 2012.
Abstract:
The Resource Description Framework (RDF) has become the defacto standard for representing information in the Semantic Web. Given the increasing amount of sensitive RDF data available on the Web, it becomes increasingly critical to guarantee secure access to this content. In this paper we advocate the use of an abstract access control model to ensure the selective exposure of RDF information. The model is defined by a set of abstract operators. Tokens are used to label RDF triples with access information. Abstract operators model RDF Schema inference rules and propagation of labels along the RDF Schema (RDFS) class and property hierarchies. In this way, the access label of a triple is a complex expression that involves the labels of the triples and the operators applied to obtain said label. Different applications can then adopt different concrete access policies that encode an assignment of the abstract tokens and operators to concrete (specific) values. Following this approach, changes in the interpretation of abstract tokens and operators can be easily implemented resulting in a very flexible mechanism that allows one to easily experiment with different concrete access policies (defined per context or user). To demonstrate the feasibility of the approach, we implemented our ideas on top of the MonetDB and PostgreSQL open source database systems. We conducted an initial set of experiments which showed that the overhead for using abstract expressions is roughly linear to the number of triples considered; performance is also affected by the characteristics of the dataset, such as the size and depth of class and property hierarchies as well as the considered concrete policy.
The talk was delivered by Martin Kersten from CWI, Netherland, at the workshop on "Global Scientific Data Infrastructures: The Findability Challenge", held in Taormina, Sicily, Italy, on May 10-11, 2012.
This talk was given by FORTH, Greece, at the European Data Forum (EDF) 2012 took place on June 6-7, 2012 in Copenhagen (Denmark) at the Copenhagen Business School (CBS).
Abstract:
Given the increasing amount of sensitive RDF data available on the Web, it becomes increasingly critical to guarantee secure access to this content. Access control is complicated when RDFS inference rules and other dependencies between access permissions of triples need to be considered; this is necessary, e.g., when we want to associate the access permissions of inferred triples with the ones that implied it. In this paper we advocate the use of abstract provenance models that are defined by means of abstract tokens operators to support fine grained access control for RDF graphs. The access label of a triple is a complex expression that encodes how said label was produced (i.e., the triples that contributed to its computation). This feature allows us to know exactly the effects of any possible change, thereby avoiding a complete recomputation of the labels when a change occurs. In addition, the same application can choose to enforce different access control policies or, different applications can enforce different policies on the same data, avoiding the recomputation of the label of a triple. Preliminary experiments have shown the applicability and benefits of our approach.
This talk has been given at the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR 2012) to be held in Rome, Italy, June 10-14, 2012 by Ilias Tahmazidis (FORTH).
Abstract:
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling the vast amounts of data for these applications. In this paper, we consider nonmonotonic reasoning, which has traditionally focused on rich knowledge structures. In particular, we consider defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge data sets. Our experimental results demonstrate that defeasible reasoning with billions of data is performant, and has the potential to scale to trillions of facts.
The presentation was delivered during the 1st International Conference on Health Information Science (HIS 2012) on April 9th, 2012 in Beijing, China.
Abstract:
In cytomics bookkeeping of the data generated during lab experiments is crucial. The current approach in cytomics is to conduct High-Throughput Screening (HTS) experiments so that cells can be tested under many different experimental conditions. Given the large amount of different conditions and the readout of the conditions through images, it is clear that the HTS approach requires a proper data management system to reduce the time needed for experiments and the chance of man-made errors. As different types of data exist, the experimental conditions need to be linked to the images produced by the HTS experiments with their metadata and the results of further analysis. Moreover, HTS experiments never stand by themselves, as more experiments are lined up, the amount of data and computations needed to analyze these increases rapidly. To that end cytomic experiments call for automated and systematic solutions that provide convenient and robust features for scientists to manage and analyze their data. In this paper, we propose a platform for managing and analyzing HTS images resulting from cytomics screens taking the automated HTS workflow as a starting point. This platform seamlessly integrates the whole HTS workflow into a single system. The platform relies on a modern relational database system to store user data and process user requests, while providing a convenient web interface to end-users. By implementing this platform, the overall workload of HTS experiments, from experiment design to data analysis, is reduced significantly. Additionally, the platform provides the potential for data integration to accomplish genotype-to-phenotype modeling studies.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Neuro-symbolic is not enough, we need neuro-*semantic*
Arrays in database systems, the next frontier?
1. Arrays in database systems,
the next frontier ?
Martin Kersten
CWI
NLeSC 9 Nov 2011
2. “We can't solve
problems by using
the same kind of
thinking we used
when we created
them.”
NLeSC 9 Nov 2011
3. Agenda
A crash course on column-stores
Column stores for science applications
The SciQL array query language
NLeSC 9 Nov 2011
4. The world of column stores
Motivation
Relational DBMSs dominate since the late 1970's / 1980's
l Transactional workloads (OLTP, row-wise access)
l I/O based processing
l Ingres, Postgresql, MySQL, Oracle, SQLserver, DB2, …
Column stores dominate product development since 2005
l Datawarehouses and business intelligence applications
l Startups: Infobright, Aster Data, Greenplum, LucidDB,..
l Commercial: Microsoft, IBM, SAP,…
MonetDB, the pioneer
NLeSC 9 Nov 2011
5. The world of column stores
Workload changes: Transactions (OLTP) vs ...
NLeSC 9 Nov 2011
6. The world of column stores
Workload changes: ... vs OLAP, BI, Data Mining, ...
NLeSC 9 Nov 2011
7. The world of column stores
Databases hit The Memory Wall
§ Detailed and exhaustive analysis for different workloads using
4 RDBMSs by Ailamaki, DeWitt, Hill,, Wood in VLDB 1999:
“DBMSs On A Modern Processor: Where Does Time Go?”
§ CPU is 60%-90% idle,
waiting for memory:
§ L1 data stalls
§ L1 instruction stalls
§ L2 data stalls
§ TLB stalls
§ Branch mispredictions
§ Resource stalls
NLeSC 9 Nov 2011
8. The world of column stores
Hardware Changes: The Memory Wall
Trip to memory = 1000s of instructions!
NLeSC 9 Nov 2011
10. BAT Data Structure
BAT:
binary association table
Head Tail
BUN:
binary unit
Hash tables, Head & Tail:
BUN heap:
T-trees, - consecutive memory
R-trees, blocks (arrays)
block (array)
... - memory-mapped file
files
Tail Heap:
- best-effort duplicate
elimination for strings
(~ dictionary encoding)
NLeSC 9 Nov 2011
11. MonetDB Front-end: SQL
l SQL 2003
l Parse SQL into logical n-ary relational algebra tree
l Translate n-ary relational algebra into logical 2-ary relational algebra
l Turn logical 2-ary plan into physical 2-ary plan (MAL program)
l Front-end specific strategic optimization:
l Heuristic optimization during all three previous steps
l Primary key and distinct constraints:
l Create and maintain hash indices
l Foreign key constraints
l Create and maintain foreign key join indices
NLeSC 9 Nov 2011
12. MonetDB Front-end: SQL
EXPLAIN SELECT a, z FROM t, s WHERE t.c = s.x;
function user.s2_1():void;
barrier _73 := language.dataflow();
_2:bat[:oid,:int] := sql.bind("sys","t","c",0);
_7:bat[:oid,:int] := sql.bind("sys","s","x",0);
_10 := bat.reverse(_7);
_11 := algebra.join(_2,_10);
_13 := algebra.markT(_11,0@0);
_14 := bat.reverse(_13);
_15:bat[:oid,:int] := sql.bind("sys","t","a",0);
_17 := algebra.leftjoin(_14,_15);
_18 := bat.reverse(_11);
_19 := algebra.markT(_18,0@0);
_20 := bat.reverse(_19);
_21:bat[:oid,:int] := sql.bind("sys","s","z",0);
_23 := algebra.leftjoin(_20,_21);
exit _73;
_24 := sql.resultSet(2,1,_17);
sql.rsColumn(_24,"sys.t","a","int",32,0,_17);
sql.rsColumn(_24,"sys.s","z","int",32,0,_23);
_33 := io.stdout();
sql.exportResult(_33,_24);
end s2_1;
NLeSC 9 Nov 2011
13. MonetDB/5 Back-end: MAL
l MAL: Monet Assembly Language
l textual interface
l Interpreted language
l Designed as system interface language
l Reduced, concise syntax
l Strict typing
l Meant for automatic generation and parsing/rewriting/processing
l Not meant to be typed by humans
l Efficient parser
l Low overhead
l Inherent support for tactical optimization: MAL -> MAL
l Support for optimizer plug-ins
l Support for runtime schedulers
l Binary-algebra core
l Flow control (MAL is computational complete)
NLeSC 9 Nov 2011
14. Processing Model (MonetDB Kernel)
l Bulk processing:
l full materialization of all intermediate results
l Binary (i.e., 2-column) algebra core:
l select, join, semijoin, outerjoin
l union, intersection, diff (BAT-wise & column-wise)
l group, count, max, min, sum, avg
l reverse, mirror, mark
l Runtime operational optimization:
l Choosing optimal algorithm & implementation according to
input properties and system status
NLeSC 9 Nov 2011
15. Processing Model (MonetDB Kernel)
l Heavy use of code expansion to reduce cost
1 algebra operator
select()
3 overloaded operators select(“=“,value) select(“between”,L,H)
select(“fcn”,parm)
10 operator algorithms scan hash-lookup bin-search bin-tree pos-lookup
scan_range_select_oid_int(),
~1500(!) routines hash_equi_select_void_str(), …
(macro expansion)
• ~1500 selection routines
• 149 unary operations
• 335 join/group operations
• ...
NLeSC 9 Nov 2011
17. The Software Stack
Strategic optimization
Front-ends XQuery SQL 03 MAL
Optimizers Tactical optimization:
MAL -> MAL rewrites
Back-end(s) MonetDB 4 MonetDB 5 MAL
Runtime
Kernel MonetDB kernel operational
optimization
NLeSC 9 Nov 2011
18. MonetDB vs Traditional DBMS Architecture
l Architecture-Conscious Query Processing
vs Magnetic disk I/O conscious processing
-
l Data layout, algorithms, cost models
l RISC Relational Algebra (operator-at-a-time)
- vs Tuple-at-a-time Iterator Model
l Faster through simplicity: no tuple expression interpreter
l Multi-Model: ODMG, SQL, XML/XQuery, ..., RDF/SPARQL
vs Relational with Bolt-on Subsystems
-
l Columns as the building block for complex data structures
l Decoupling of Transactions from Execution/Buffering
vs ARIES integrated into Execution/Buffering/Indexing
-
l ACID, but not ARIES.. Pay as you need transaction overhead.
l Run-Time Indexing and Query Optimization
- vs Static DBA/Workload-driven Optimization & Indexing
l Extensible Optimizer Framework;
l cracking, recycling, sampling-based runtime optimization
NLeSC 9 Nov 2011
19. Evolution
It is not the strongest of the
species that survives, nor the
most intelligent, but the one
most responsive to change.
Charles Darwin (1809 - 1882)
NLeSC 9 Nov 2011
20. Agenda
A crash course on column-stores
Column stores for science applications
The SciQL array query language
NLeSC 9 Nov 2011
22. SkyServer Schema
446
columns
>585
million
rows
6
columns
>
20
Billion
rows
NLeSC 9 Nov 2011
23. “An architecture for recycling
Recycler intermediates in a column-store”.
Ivanova, Kersten, Nes, Goncalves.
motivation & idea ACM TODS 35(4), Dec. 2010
Motivation:
l scientific databases, data analytics
l Terabytes of data (observational , transactional)
l Prevailing read-only workload
l Ad-hoc queries with commonalities
Background:
l Operator-at-a-time execution paradigm
Ø Automatic materialization of intermediates
l Canonical column-store organization
Ø Intermediates have reduced dimensionality and finer granularity
Ø Simplified overlap analysis
Recycling idea:
l instead of garbage collecting,
l keep the intermediates and reuse them
l speed up query streams with commonalities
l low cost and self-organization
NLeSC 9 Nov 2011
24. “An architecture for recycling
Recycler intermediates in a column-store”.
Ivanova, Kersten, Nes, Goncalves.
fit into MonetDB ACM TODS 35(4), Dec. 2010
SQL
XQuery
func6on
user.s1_2(A0:date,
...):void;
X5
:=
sql.bind("sys","lineitem",...);
X10
:=
algebra.select(X5,A0);
X12
:=
sql.bindIdx("sys","lineitem",...);
MAL
X15
:=
algebra.join(X10,X12);
X25
:=
m6me.addmonths(A1,A2);
...
Recycler
Tac6cal
Op6mizer
Op6mizer
func6on
user.s1_2(A0:date,
...):void;
X5
:=
sql.bind("sys","lineitem",...);
MAL
X10
:=
algebra.select(X5,A0);
X12
:=
sql.bindIdx("sys","lineitem",...);
Run-‐6me
Support
X15
:=
algebra.join(X10,X12);
MonetDB
Kernel
X25
:=
m6me.addmonths(A1,A2);
...
Admission
&
Evic6on
MonetDB
Recycle
Pool
Server
NLeSC 9 Nov 2011
25. “An architecture for recycling
Recycler intermediates in a column-store”.
Ivanova, Kersten, Nes, Goncalves.
instruction matching ACM TODS 35(4), Dec. 2010
Run time comparison of
l instruction types
l argument values
Y3
:=
sql.bind("sys","orders","o_orderdate",0);
Exact
X1
:=
sql.bind("sys","orders","o_orderdate",0);
…
matching
Name Value Data type Size
X1 10 :bat[:oid,:date]
T1 “sys” :str
T2 “orders” :str
…
NLeSC 9 Nov 2011
26. “An architecture for recycling
Recycler intermediates in a column-store”.
Ivanova, Kersten, Nes, Goncalves.
instruction subsumption ACM TODS 35(4), Dec. 2010
Y3
:=
algebra.select(X1,20,45);
X3
:=
algebra.select(X1,10,80);
…
X5
:=
algebra.select(X1,20,60);
X5
Name Value Data type Size
X1 10 :bat[:oid,:int] 2000
X3 130 :bat[:oid,:int] 700
X5 150 :bat[:oid,:int] 350
…
NLeSC 9 Nov 2011
27. “An architecture for recycling
Recycler intermediates in a column-store”.
Ivanova, Kersten, Nes, Goncalves.
SkyServer evaluation ACM TODS 35(4), Dec. 2010
Sloan Digital Sky Survey / SkyServer
http://cas.sdss.org
l 100 GB subset of DR4
l 100-query batch from January
2008 log
l 1.5GB intermediates, 99% reuse
l Join intermediates major
consumer of memory and major
contributor to savings
NLeSC 9 Nov 2011
28. Agenda
A crash course on column-stores
Column stores for science applications
The SciQL array query language
NLeSC 9 Nov 2011
29. What is an array?
An array is a systematic arrangement of objects
addressed by dimension values.
Get(A, X, Y,…) => Value
Set(A, X, Y,…) <= Value
There are many species:
vector, bit array, dynamic array, parallel array,
sparse array, variable length array, jagged array
NLeSC 9 Nov 2011
31. Arrays in DBMS
Relational prototype built on arrays, Peterlee
IS(1975)
Persistent programming languages, Astral (1980),
Plain (1980)
Object-orientation and persistent languages were
the make belief to handle them, O2(1992)
NLeSC 9 Nov 2011
32. PostgreSQL 8.3
Array declarations:
CREATE TABLE sal_emp ( name text, pay_by_quarter integer[], schedule text[][]);
CREATE TABLE tictactoe ( squares integer[3][3] );
Array operations: denotation ([]), contains (@>), is
contained in (<@), append, concat (||),
dimension, lower, upper, prepend, to-string, from-
string
Array constraints: none, no enforcement of
dimensions.
NLeSC 9 Nov 2011
33. Mysql
From the MySQL forum May 2010:
“
>How to store multiple values in a single field? Is there any array
data type concept in mysql?
>
As Jörg said "Multiple values in a single field" would be an explicit
violation of the relational model..."
“
Is there any experience beyond encoding it as blobs?
NLeSC 9 Nov 2011
34. Rasdaman
Breaks large C++ arrays (rasters) into disjoint
chunks
Maps chunks into large binary objects (blob)
Provide function interface to access them
RASCAL, a SQL92 extension
Known to work up to 12 TBs.
NLeSC 9 Nov 2011
35. SciDB
Breaks large C++ arrays (rasters) into overlapping
chunks
Built storage manager from scratch
Map-reduce processing model
Provide function interface to access them
AQL, a crippled SQL92
NLeSC 9 Nov 2011
36. What is the problem?
- Appropriate array denotations?
- Functional complete operation set ?
- Scale ?
- Size limitations due to (blob) representations ?
- Community awareness?
NLeSC 9 Nov 2011
37. MonetDB SciQL
SciQL (pronounced ‘cycle’ )
• A backward compatible extension of SQL’03
• Symbiosis of relational and array paradigm
• Flexible structure-based grouping
• Capitalizes the MonetDB physical array storage
• Recycling, an adaptive ‘materialized view’
• Zero-cost attachment contract for cooperative clients
http://www.cwi.nl/~mk/SciQL.pdf
NLeSC 9 Nov 2011
38. Table vs arrays
CREATE TABLE tmp
A collection of tuples
Indexed by a (primary) key
Default handling
Explicitly created using
INS/UPD/DEL
NLeSC 9 Nov 2011
39. Table vs arrays
CREATE TABLE tmp CREATE ARRAY tmp
A collection of tuples A collection of a priori defined tuples
Indexed by a (primary) key Indexed by dimension expressions
Default handling Implicitly defined by default value,
Explicitly created using To be updated with INS/DEL/UPD
INS/UPD/DEL
NLeSC 9 Nov 2011
40. SciQL examples
CREATE TABLE matrix (
x integer,
y integer,
value float
PRIMARY KEY (x,y) );
INSERT INTO matrix VALUES
(0,0,0),(0,1,0),(1,1,0)(1,0,0);
0 0 0
0 1 0
1 1 0
1 0 0
NLeSC 9 Nov 2011
41. SciQL examples
CREATE TABLE matrix ( CREATE ARRAY matrix (
x integer, x integer DIMENSION[2],
y integer, y integer DIMENSION[2],
value float value float DEFAULT 0);
PRIMARY KEY (x,y) );
INSERT INTO matrix VALUES
null … … …
(0,0,0),(0,1,0),(1,1,0)(1,0,0);
null null null …
0 0 0 0 0
0 null …
1
0 1 0 0 0 0
0 null null
1 1 0 0 1
1 0 0
NLeSC 9 Nov 2011
42. SciQL examples
CREATE TABLE matrix ( CREATE ARRAY matrix (
x integer, x integer DIMENSION[2],
y integer, y integer DIMENSION[2],
value float value float DEFAULT 0);
PRIMARY KEY (x,y) );
DELETE matrix WHERE y=1 DELETE matrix WHERE y=1
A hole in the array
0 0 0
null null
1
1 0 0
0 0 0
0 1
NLeSC 9 Nov 2011
43. SciQL examples
CREATE TABLE matrix ( CREATE ARRAY matrix (
x integer, x integer DIMENSION[2],
y integer, y integer DIMENSION[2],
value float value float DEFAULT 0);
PRIMARY KEY (x,y) );
INSERT INTO matrix VALUES INSERT INTO matrix VALUES
(0,1,1), (1,1,2) (0,1,1), (1,1,2)
0 0 0
1 2
1
1 0 0
0 0 0
0 1 1
0 1
1 1 2
NLeSC 9 Nov 2011
44. SciQL unbounded arrays
CREATE TABLE matrix ( CREATE ARRAY matrix (
x integer, x integer DIMENSION,
y integer, y integer DIMENSION,
value float value float DEFAULT 0);
PRIMARY KEY (x,y) );
INSERT INTO matrix VALUES INSERT INTO matrix VALUES
(0,2,1), (0,1,2) (0,2,1), (0,1,2)
0 2 1 2 1 0
0 1 2 1 0 0
0 0 2
0 1
NLeSC 9 Nov 2011
46. SciQL table queries
CREATE ARRAY matrix (
x integer DIMENSION,
y integer DIMENSION,
value float DEFAULT 0 );
-- simple checker boarding aggregation
SELECT sum(value) FROM matrix WHERE (x + y) % 2 = 0
NLeSC 9 Nov 2011
47. SciQL array queries
CREATE ARRAY matrix (
x integer DIMENSION,
y integer DIMENSION,
value float DEFAULT 0 );
-- group based aggregation to construct an unbounded vector
SELECT [x], sum(value) FROM matrix
WHERE (x + y) % 2 = 0
GROUP BY x;
NLeSC 9 Nov 2011
48. SciQL array views
CREATE ARRAY vmatrix (
x integer DIMENSION[-1:5],
y integer DIMENSION[-1:5],
value float DEFAULT -1 )
AS SELECT x, y, value FROM matrix;
-1 -1 -1 -1
-1 0 0 -1
-1 0 0 -1
-1 -1 -1 -1
NLeSC 9 Nov 2011
49. SciQL tiling examples
V0,3 V1,3 V2,3 V3,3
V0,2 V1,2 V2,2 V3,2
V0,1 V1,1 V2,1 V3,1
Anchor
Point V0,0 V1,0 V2,0 V3,0
SELECT x, y, avg(value)
FROM matrix
GROUP BY matrix[x:1:x+2][y:1:y+2];
NLeSC 9 Nov 2011
50. SciQL tiling examples
V0,3 V1,3 V2,3 V3,3
V0,2 V1,2 V2,2 V3,2
V0,1 V1,1 V2,1 V3,1
Anchor
Point V0,0 V1,0 V2,0 V3,0
SELECT x, y, avg(value)
FROM matrix
GROUP BY DISTINCT matrix[x:1:x+2][y:1:y+2];
NLeSC 9 Nov 2011
51. SciQL tiling examples
V0,3 V1,3 V2,3 V3,3
Anchor
Point V0,2 V1,2 V2,2 V3,2
V0,1 V1,1 V2,1 V3,1
null
V0,0 V1,0 V2,0 V3,0
null null
SELECT x, y, avg(value)
FROM matrix
GROUP BY DISTINCT matrix[x-1:1:x+1][y:1:y+2];
NLeSC 9 Nov 2011
52. SciQL tiling examples
V0,3 V1,3 V2,3 V3,3
Anchor
Point V0,2 V1,2 V2,2 V3,2
V0,1 V1,1 V2,1 V3,1
V0,0 V1,0 V2,0 V3,0
SELECT x, y, avg(value)
FROM matrix
GROUP BY matrix[x][y],
matrix[x-1][y], matrix[x+1][y],
matrix[x][y-1], matrix[x][y+1];
NLeSC 9 Nov 2011
53. SciQL, A Query Language for Science Applications
• Seamless integration of array-, set-, and sequence-
semantics.
• Dimension constraints as a declarative means for
indexed access to array cells.
• Structural grouping to generalize the value-based
grouping towards selective access to groups of cells
based on positional relationships for aggregation.
NLeSC 9 Nov 2011
54. Seismology use case
Rietbrock: Chili earthquake
… 2TB of wave fronts
… filter by sta/lta
… remove false positives
… window-based 3 min cuts
… heuristic tests
… interactive response required …
How can a database system help?
Scanning 2TB on modern pc takes >3 hours
NLeSC 9 Nov 2011
55. Use case, a SciQL dream
Rietbrock: Chili earthquake
create array mseed (
tick timestamp dimension[timestamp ‘2010’:*],
data decimal(8,6),
station string );
NLeSC 9 Nov 2011
56. Use case, a SciQL dream
Rietbrock: … filter by sta/lta
--- average by window of 5 seconds
select A.tick, avg(A.data)
from mseed A
group by A[tick:1:tick + 5 seconds]
NLeSC 9 Nov 2011
57. Use case, a SciQL dream
Rietbrock: … filter by sta/lta
select A.tick
from mseed A, mseed B
where A.tick = B.tick
and avg(A.data) / avg(B.data) > delta
group by A[tick:tick + 5 seconds],
B[tick:tick + 15 seconds]
NLeSC 9 Nov 2011
58. Use case, a SciQL dream
Rietbrock: … filter by sta/lta
create view candidates(
station string,
tick timestamp,
ratio float ) as
select A.station, A.tick, avg(A.data) / avg(B.data) as ratio
from mseed A, mseed B
where A.tick = B.tick
and avg(A.data) / avg(B.data) > delta
group by A[tick:tick + 5 seconds],
B[tick:tick + 15 seconds]
NLeSC 9 Nov 2011
59. Use case, a SciQL dream
Rietbrock: … remove false positives
-- remove isolated errors by direct environment
-- using wave propagation statics
create table neighbors(
head string,
tail string,
delay timestamp,
weight float)
NLeSC 9 Nov 2011
60. Use case, a SciQL dream
Rietbrock: … remove false positives
select A.tick, B.tick
from candidates A, candidates B, neighbors N
where A.station = N.head
and B.station = N.tail
and B.tick = A.tick + N.delay
and B.ratio * N.weight < A.ratio;
NLeSC 9 Nov 2011
61. Use case, a SciQL dream
Rietbrock: … remove false positives
delete from candidates
select A.tick
from candidates A, candidates B, neighbors N
where A.station = N.head
and B.station = N.tail
and B.tick = A.tick + N.delay
and B.ratio * N.weight < A.ratio;
NLeSC 9 Nov 2011
62. Use case, a SciQL dream
Rietbrock: … window-based 3 min cuts
… heuristic tests
select B.station, myfunction(B.data)
from candidates A, mseed B
where A.tick = B.tick
group by distinct B[tick:tick + 3 minutes];
-- using a User Defined Function written in C.
NLeSC 9 Nov 2011
63. Use case
Rietbrock: … interactive response required …
The query over 2TB of seismic data will be
handled before he finishes his coffee.
NLeSC 9 Nov 2011
64. Status
• The language definition is ‘finished’
• The grammar is included in SQL parser
• Semantic checks added to SQL parser
• A test suite is being built
• Runtime support features and software stack
• …
• Exposure to real life cases and external libraries
NLeSC 9 Nov 2011