Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Clustering algorithms, for example, try to partition elements of a dataset into related groups. Dimensionality reduction algorithms search for a simpler representation of a dataset. Spark's MLLib module contains implementations of several unsupervised learning algorithms that scale to huge datasets. In this talk, we'll dive into uses and implementations of Spark's K-means clustering and Singular Value Decomposition (SVD).
Bio:
Sandy Ryza is an engineer on the data science team at Cloudera. He is a committer on Apache Hadoop and recently led Cloudera's Apache Spark development.
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellDatabricks
Nested data types offer Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way.
While this feature is certainly useful, it can quite bit cumbersome to manipulate data inside of complex objects because SQL (and Spark) do not have primitives for working with such data. In addition, it is time-consuming, non-performant, and non-trivial. During this talk we will discuss some of the commonly used techniques for working with complex objects, and we will introduce new ones based on Higher-order functions. Higher-order functions will be part of Spark 2.4 and are a simple and performant extension to SQL that allow a user to manipulate complex data such as arrays.
Spark schema for free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
Video of the presentation can be seen here: https://www.youtube.com/watch?v=uxuLRiNoDio
The Data Source API in Spark is a convenient feature that enables developers to write libraries to connect to data stored in various sources with Spark. Equipped with the Data Source API, users can load/save data from/to different data formats and systems with minimal setup and configuration. In this talk, we introduce the Data Source API and the unified load/save functions built on top of it. Then, we show examples to demonstrate how to build a data source library.
Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Clustering algorithms, for example, try to partition elements of a dataset into related groups. Dimensionality reduction algorithms search for a simpler representation of a dataset. Spark's MLLib module contains implementations of several unsupervised learning algorithms that scale to huge datasets. In this talk, we'll dive into uses and implementations of Spark's K-means clustering and Singular Value Decomposition (SVD).
Bio:
Sandy Ryza is an engineer on the data science team at Cloudera. He is a committer on Apache Hadoop and recently led Cloudera's Apache Spark development.
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellDatabricks
Nested data types offer Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way.
While this feature is certainly useful, it can quite bit cumbersome to manipulate data inside of complex objects because SQL (and Spark) do not have primitives for working with such data. In addition, it is time-consuming, non-performant, and non-trivial. During this talk we will discuss some of the commonly used techniques for working with complex objects, and we will introduce new ones based on Higher-order functions. Higher-order functions will be part of Spark 2.4 and are a simple and performant extension to SQL that allow a user to manipulate complex data such as arrays.
Spark schema for free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
Video of the presentation can be seen here: https://www.youtube.com/watch?v=uxuLRiNoDio
The Data Source API in Spark is a convenient feature that enables developers to write libraries to connect to data stored in various sources with Spark. Equipped with the Data Source API, users can load/save data from/to different data formats and systems with minimal setup and configuration. In this talk, we introduce the Data Source API and the unified load/save functions built on top of it. Then, we show examples to demonstrate how to build a data source library.
Spark Schema For Free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
Stratosphere is the next generation big data processing engine.
These slides introduce the most important features of Stratosphere by comparing it with Apache Hadoop.
For more information, visit stratosphere.eu
Based on university research, it is now a completely open-source, community driven development with focus on stability and usability.
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...GeeksLab Odessa
23.05.15 Одесса. Impact Hub Odessa. Конференция AI&BigData Lab
Руденко Петр (Инженер-программист, Datarobot) Automation and optimisation of machine learning pipelines on top of Apache Spark
В компании Datarobot мы занимаемся автоматизированным построением точных предсказательных моделей. Помимо непосредственного обучения модели, важную роль во всем процессе играет препроцессинг данных (feature selection/normalization/transformation). В своем докладе я поделюсь нашим опытом использования платформы Apache Spark и в частности новыми ml API, которые предоставляют функционал для построения пайплайнов (Pipeline), поиска оптимальных значений гиперпараметров моделей (Crossvalidation).
Подробнее:
http://geekslab.co/
https://www.facebook.com/GeeksLab.co
https://www.youtube.com/user/GeeksLabVideo
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerSachin Aggarwal
RDD recap
Spark SQL library
Architecture of Spark SQL
Comparison with Pig and Hive Pipeline
DataFrames
Definition of a DataFrames API
DataFrames Operations
DataFrames features
Data cleansing
Diagram for logical plan container
Plan Optimization & Execution
Catalyst Analyzer
Catalyst Optimizer
Generating Physical Plan
Code Generation
Extensions
• Distributed datasets loaded into named columns (similar to relational DBs or
Python DataFrames).
• Can be constructed from existing RDDs or external data sources.
• Can scale from small datasets to TBs/PBs on multi-node Spark clusters.
• APIs available in Python, Java, Scala and R.
• Bytecode generation and optimization using Catalyst Optimizer.
• Simpler DSL to perform complex and data heavy operations.
• Faster runtime performance than vanilla RDDs.
Learning Objectives - In this module, you will understand Hadoop MapReduce framework and how MapReduce works on data stored in HDFS. Also, you will learn what are the different types of Input and Output formats in MapReduce framework and their usage.
Stratosphere Intro (Java and Scala Interface)Robert Metzger
A quick walk overview of Stratosphere, including our Scala programming interface.
See also bigdataclass.org for two self-paced Stratosphere Big Data exercises.
More information about Stratosphere: stratosphere.eu
Practical Machine Learning Pipelines with MLlibDatabricks
This talk from 2015 Spark Summit East discusses Pipelines and related concepts introduced in Spark 1.2 which provide a simple API for users to set up complex ML workflows.
This is a 3 part series on Java8 Features. Drop me an email for a discussion - singh.marut@gmail.com
https://github.com/singhmarut/java8training
Videos available at my youtube channel https://www.youtube.com/channel/UCBM4yHwfjQ_syW6Lz8kYpmA
Photon Technical Deep Dive: How to Think VectorizedDatabricks
Photon is a new vectorized execution engine powering Databricks written from scratch in C++. In this deep dive, I will introduce you to the basic building blocks of a vectorized engine by walking you through the evaluation of an example query with code snippets. You will learn about expression evaluation, compute kernels, runtime adaptivity, filter evaluation, and vectorized operations against hash tables.
In this session we'll take a whirlwind tour through Clojure and introduce some basic syntax and the philosophy behind it. We'll see how the concepts of functional programming are inherent in the language and how immutability, lazy sequence processing, and a few lines of code can change the way you think.
By Nir Rubinstein from "AppsFlyer" ~45min.
What is SamzaSQL, and what might I use it for? Does this mean that Samza is turning into a database? What is a query optimizer, and what can it do for my streaming queries?
How does Apache Calcite parse, validate and optimize streaming SQL queries? How is relational algebra extended to handle streaming?
Spark Schema For Free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
Stratosphere is the next generation big data processing engine.
These slides introduce the most important features of Stratosphere by comparing it with Apache Hadoop.
For more information, visit stratosphere.eu
Based on university research, it is now a completely open-source, community driven development with focus on stability and usability.
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...GeeksLab Odessa
23.05.15 Одесса. Impact Hub Odessa. Конференция AI&BigData Lab
Руденко Петр (Инженер-программист, Datarobot) Automation and optimisation of machine learning pipelines on top of Apache Spark
В компании Datarobot мы занимаемся автоматизированным построением точных предсказательных моделей. Помимо непосредственного обучения модели, важную роль во всем процессе играет препроцессинг данных (feature selection/normalization/transformation). В своем докладе я поделюсь нашим опытом использования платформы Apache Spark и в частности новыми ml API, которые предоставляют функционал для построения пайплайнов (Pipeline), поиска оптимальных значений гиперпараметров моделей (Crossvalidation).
Подробнее:
http://geekslab.co/
https://www.facebook.com/GeeksLab.co
https://www.youtube.com/user/GeeksLabVideo
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerSachin Aggarwal
RDD recap
Spark SQL library
Architecture of Spark SQL
Comparison with Pig and Hive Pipeline
DataFrames
Definition of a DataFrames API
DataFrames Operations
DataFrames features
Data cleansing
Diagram for logical plan container
Plan Optimization & Execution
Catalyst Analyzer
Catalyst Optimizer
Generating Physical Plan
Code Generation
Extensions
• Distributed datasets loaded into named columns (similar to relational DBs or
Python DataFrames).
• Can be constructed from existing RDDs or external data sources.
• Can scale from small datasets to TBs/PBs on multi-node Spark clusters.
• APIs available in Python, Java, Scala and R.
• Bytecode generation and optimization using Catalyst Optimizer.
• Simpler DSL to perform complex and data heavy operations.
• Faster runtime performance than vanilla RDDs.
Learning Objectives - In this module, you will understand Hadoop MapReduce framework and how MapReduce works on data stored in HDFS. Also, you will learn what are the different types of Input and Output formats in MapReduce framework and their usage.
Stratosphere Intro (Java and Scala Interface)Robert Metzger
A quick walk overview of Stratosphere, including our Scala programming interface.
See also bigdataclass.org for two self-paced Stratosphere Big Data exercises.
More information about Stratosphere: stratosphere.eu
Practical Machine Learning Pipelines with MLlibDatabricks
This talk from 2015 Spark Summit East discusses Pipelines and related concepts introduced in Spark 1.2 which provide a simple API for users to set up complex ML workflows.
This is a 3 part series on Java8 Features. Drop me an email for a discussion - singh.marut@gmail.com
https://github.com/singhmarut/java8training
Videos available at my youtube channel https://www.youtube.com/channel/UCBM4yHwfjQ_syW6Lz8kYpmA
Photon Technical Deep Dive: How to Think VectorizedDatabricks
Photon is a new vectorized execution engine powering Databricks written from scratch in C++. In this deep dive, I will introduce you to the basic building blocks of a vectorized engine by walking you through the evaluation of an example query with code snippets. You will learn about expression evaluation, compute kernels, runtime adaptivity, filter evaluation, and vectorized operations against hash tables.
In this session we'll take a whirlwind tour through Clojure and introduce some basic syntax and the philosophy behind it. We'll see how the concepts of functional programming are inherent in the language and how immutability, lazy sequence processing, and a few lines of code can change the way you think.
By Nir Rubinstein from "AppsFlyer" ~45min.
What is SamzaSQL, and what might I use it for? Does this mean that Samza is turning into a database? What is a query optimizer, and what can it do for my streaming queries?
How does Apache Calcite parse, validate and optimize streaming SQL queries? How is relational algebra extended to handle streaming?
Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com
The Scala programming language has been gaining momentum recently as an alternative (and some might say successor) to Java on the JVM. This talk will start with an introduction to basic Scala syntax and concepts, then delve into some of Scala's more interesting and unique features. At the end we'll show a brief example of how Scala is used by the Lift web framework to simplify dynamic web apps.
Agenda
Setting up an angular app.
Introduction to tools - Babel, Webpack
Alternative to Gulp, Grunt & Bower.
Writing Controllers, Services, Directives etc..
Testing Javascript with Jasmine.
Setting up Karma with Webpack.
Let’s understand code coverage.
An alternative: JEST
A short introduction (with many examples) to the Scala programming language and also an introduction to using the Play! Framework for modern, safe, efffcient and reactive web applications.
Putting the F in FaaS: Functional Compositional Patterns in a Serverless WorldLars Trieloff
Presented at #ServerlessConf 2017 in New York City. Don't go looking for serverless patterns in strange places, take existing functional programming patterns instead.
Data Natives 2015: Predictive Applications are Going to Steal Your Job: this ...Lars Trieloff
Fears of robots taking away blue collar jobs have been coming and going over the last decade. But this time it’s different: a new breed of predictive applications, or white-collar robots are going after knowledge-worker and managerial jobs. Using automated data-driven decisions, they speed up and improve critical business processes and leave employers and employee’s scratching their heads what is coming next. Lars Trieloff, who is building predictive applications for a living at Blue Yonder explains what happens, why it happens and what it means for you (and your boss).
Automated Decision making with Predictive Applications – Big Data HamburgLars Trieloff
Most businesses are making most decisions the way Lizards do: based on very simple reflex-response patterns and let cognitive biases taint their decision making. Instead of letting gut feel and biases take over, predictive applications make decisions fast, cheap and fact-based.
Automated Decision Making with Predictive Applications – Big Data DüsseldorfLars Trieloff
Another installment and iteration of my talk on predictive applications, automated decision making and why cognitive biases prevent us from making the best decisions at scale
Automated decision making with predictive applications – Big Data BrusselsLars Trieloff
My slides from Dataconomy's Big Data, Brussels event in March 2015. Key topics: what are predictive applications and how can they help companies make better decisions, faster and cheaper.
Automated decision making with predictive applications – Big Data AmsterdamLars Trieloff
My slides from tonight's talk at Impact HUB in Amsterdam on big data, machine learning, cognitive biases and how to overcome them with predictive applications.
Automated decision making using Predictive Applications – Big Data ParisLars Trieloff
Predictive Applications enable automated data-driven decisions using big data, machine learning, artificial intelligence and optimization algorithms. With this, they are able to scale decision making, improve the quality of decisions and circumvent cognitive biases that cloud human decision making.
Big Data Munich – Decision Automation and Big DataLars Trieloff
My presentation from Big Data Munich: How decision automation based on big data and machine learning can help you run a better business and avoid common cognitive biases.
Big Data Berlin – Automating Decisions is the Next Frontier for Big DataLars Trieloff
Just collecting, storing and analyzing data is not enough. In order to benefit from it, you have to overcome organizational and human inertia and establish automated processes that use insights gained from your data.
This presentation has been presented at http://dataconomy.com/28-august-2014-big-data-berlin/
Digital marketing rapidly introduces new channels, concepts and context into marketing. This can lead to confusion and cognitive dissonance between traditional right-brain marketers and digital left-brain marketers. By going beyond the surface of what is visible in terms of vendors and products and concentrating on the fundamental building blocks of marketing, "The DNA of Marketing" offers a new look at marketing and a way to make sense of digital marketing innovation.
Combine Social Media with Social Communities in CQ5 to open additional channels for your marketing campaigns and increase targeting accuracy, maximize conversion and drive profitability.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
3. What is DAX
• DAX means Declarative API for XML
• A way to process XML
• By expressing what parts of a document
you want to process
• Based on Java, Javascript and Cocoon
4. DAX History
• Feb 2005: Kim Wolk writes XMLTO, a .NET
library that transforms XML into objects
• March 2005: Ryan Cox ports it to Java 5,
using Annotations and dom4j‘s Transformer
API
• 2006 to 2007: DAX is used in production at
Mindquarry, adopted to Cocoon
13. DAX-Java How to use it
public class ElementCounter extends Transformer {
Map elements = new Hashmap<String, Integer>();
public void processElement(Node context) {
String name = context.getName();
if (elements.hasKey(name)) {
elements.put(name, elements.get(name) + 1);
} else {
elements.put(name, 1);
}
}
}
14. DAX-Java How to use it
public class ElementCounter extends Transformer {
Map elements = new Hashmap<String, Integer>();
@Path(quot;*quot;) //select all elements
public void processElement(Node context) {
String name = context.getName();
if (elements.hasKey(name)) {
elements.put(name, elements.get(name) + 1);
} else {
elements.put(name, 1);
}
}
}
15. DAX-Java How to use it
public class SourceCounter extends Transformer {
Map sources = new Hashmap<String, Integer>();
@Path(quot;img[@src]quot;) //select all elements
public void processElement(Node context) {
String name = this.valueOf(quot;@srcquot;);
if (elements.hasKey(name)) {
elements.put(name, elements.get(name) + 1);
} else {
elements.put(name, 1);
}
}
}
16. DAX-Java How it works
• Simple parsing algorithm:
• traverse the DOM of the XML document
• for each node, find an annotated method
• with matching XPath
• execute this method
• Just like XSLT's templates
19. DAX-Javascript Why?
• XSLT is fine for transforming XML
• but no side-effects possible
• no access to external data model
Input XSLT Output
?
Model
20. DAX-Javascript Background
• Map most important XSLT concepts to
Javascript concepts
XSLT Javascript
<xsl:stylesheet> Stylesheet object
template function of the Stylesheet
<xsl:template>
object
applyTemplate function of the
<xsl:apply-templates/>
Stylesheet object
copy function of the Stylesheet object
<xsl:copy/>
(with inlined body function)
21. DAX-Javascript How to use it
<xsl:template match=quot;fooquot;>
<bar>
<xsl:comment>example code uses foo</xsl:comment>
<xsl:apply-templates />
</bar>
</xsl:template/>
Stylesheet.template({match:quot;fooquot;}, function(node) {
this.element(quot;barquot;, function(node) {
this.comment(quot;example code uses fooquot;);
this.applyTemplates();
});
});
22. DAX-Javascript How to use it
<xsl:template match=quot;node()|@*quot;>
<xsl:copy>
<xsl:apply-templates select=quot;node()|@*quot; />
</xsl:copy>
</xsl:template/>
Stylesheet.template({match:quot;node()|@*quot;}, function
(node) {
this.copy(function(node) {
this.applyTemplates({select:quot;node()|@*quot;})
});
});
23. DAX-Javascript How it works
• Uses Rhino Javascript Engine
• full access to Java object model
• allows side-effects when transforming
XML
• Parses the incoming XML stream
• Finds and fires matching functions
25. DAX-Cocoon How to use it
<map:components>
<map:transformers>
<map:transformer name=quot;daxquot;
src=quot;dax.cocoon.DAXTransformerquot; />
</map:transformers>
</map:components>
26. DAX-Cocoon How to use it
<map:match pattern=quot;/resource/*quot;>
<map:select type=quot;request-methodquot;>
<map:generate type=quot;streamquot; />
<map:when test=quot;PUTquot;>
<map:transform type=quot;daxquot; src=quot;dax/res.jsquot;>
<map:parameter name=quot;resquot; value=quot;{1}quot; />
</map:transform>
</map:when>
</map:select>
</map:match>
27. DAX-Cocoon How to use it
var resourcemanager =
cocoon.getComponent(quot;resourcemanagerquot;);
Stylesheet.template({match:quot;delquot;}, function(node) {
var that = this;
this.copy(function(node) {
if (that.valueOf(quot;.quot;)==cocoon.parameters.res) {
resourcemanager.delete(that.valueOf(quot;@nodequot;))
}
this.applyTemplates({select:quot;node()|@*quot;})
});
});
28. DAX-Cocoon How it works
• Implemented as a Cocoon Transformer
• Pull in quot;cocoonquot; object as Flowscript does
• Usage Scenario: REST Application
• validate using DAX (e.g. by checking
database)
• transform using DAX (e.g by triggering
actions)
• save using DAX (e.g. by changing model)
29. DAX-Cocoon Open Questions
• Caching
• We do not know if a transformation has
non-cacheable side-effects
• Mixing DAX and XSLT
• perhaps E4X is a way to conveniently
embed XSLT
• Not all XSLT concepts implemented (sorting)
30. How to go on?
• Read more
• http://www.asciiarmor.com/2005/03/03/introducing-
dax-declarative-api-for-xml/
• https://www.mindquarry.org/wiki/dax/
• http://www.codeconsult.ch/bertrand/archives/
000802.html
• Download
• http://releases.mindquarry.org/dax/
• (Maven artifacts available)
31. Thank you very much
lars@trieloff.net
For more information, see my weblog at
http://weblogs.goshaky.com/weblog/lars