This document provides an overview of using graphs and hierarchies in SQL databases with OQGRAPH. It discusses how trees and graphs differ, examples of each, and some of the challenges of representing them in relational databases. It then introduces OQGRAPH as a storage engine that can perform graph computations directly in SQL. Key features of OQGRAPH like inserting edges, performing path queries, and joining to other tables are demonstrated. Later versions provide additional optimizations and the ability to use an existing table as the source of edges.
Java heap memory model has wasteful memory usage. References, object headers, internal collection structure, extra fields such as String.hashCode… This talk shows practical ways to reduce memory usage and fit more data into memory: primitive types, specialized java collections, bit packing, reducing number of pointers, replacing String with char[], semi-serialized objects… As bonus we get lower GC overhead by reducing number of references.
This presentation show the main Spark characteristics, like RDD, Transformations and Actions.
I used this presentation for many Spark Intro workshops from Cluj-Napoca Big Data community : http://www.meetup.com/Big-Data-Data-Science-Meetup-Cluj-Napoca/
HDF5 is a powerful and feature-rich creature, and getting the most out of it requires powerful tools. The MathWorks provides a "low-level" interface to the HDF5 library that closely corresponds to the C API and exposes much of its richness. This short tutorial will present ways to use the low-level MATLAB interface to build those tools and tackle such topics as subsetting, chunking, and compression.
Java heap memory model has wasteful memory usage. References, object headers, internal collection structure, extra fields such as String.hashCode… This talk shows practical ways to reduce memory usage and fit more data into memory: primitive types, specialized java collections, bit packing, reducing number of pointers, replacing String with char[], semi-serialized objects… As bonus we get lower GC overhead by reducing number of references.
This presentation show the main Spark characteristics, like RDD, Transformations and Actions.
I used this presentation for many Spark Intro workshops from Cluj-Napoca Big Data community : http://www.meetup.com/Big-Data-Data-Science-Meetup-Cluj-Napoca/
HDF5 is a powerful and feature-rich creature, and getting the most out of it requires powerful tools. The MathWorks provides a "low-level" interface to the HDF5 library that closely corresponds to the C API and exposes much of its richness. This short tutorial will present ways to use the low-level MATLAB interface to build those tools and tackle such topics as subsetting, chunking, and compression.
The MathWorks introduced MATLAB support for HDF5 in 2002 via three high-level functions: HDF5INFO, HDF5READ, and HDF5WRITE. These functions worked well for their purpose-providing simple interfaces to a complicated file format-but MATLAB users requested finer control over their HDF5 files and the HDF5 library. MATLAB 7.3 (R2006b) adds this precise level of support for version 1.6.5 of the HDF5 library via a close mapping of the HDF5 C API to MATLAB function calls.
This presentation will briefly introduce the earlier, high-level HDF5 interface (and its limitations) before showing in detail the low-level HDF5 functions. It will show how to interact with the HDF5 library and files using the thirteen classes of functions in MATLAB, which encapsulate groupings of functionality found in the HDF5 C API. But because MATLAB is itself a higher-level language than C, we will also present MATLAB's extensions and modifications of the HDF5 C API that make it more MATLAB-like, work with defined values, and perform ID and memory management.
Wrapping a library like HDF5 requires a great deal of effort and design, and we will briefly present a general-purpose mechanism for creating close mappings between library interfaces and an application like MATLAB. One of our goals in this presentation is to facilitate communication with The HDF Group about how The MathWorks builds our HDF5 interfaces in order to ease adoption of future versions of the HDF5 library in large, general-purpose applications.
PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
A very short set of slides to describe an RDD data structure.
Extracted from my 3-day course: www.sparkInternals.com
There is also a video of this on YouTube: http://youtu.be/odcEg515Ne8
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
This talk starts with a focus on "How to not make Spark Explode" as a developer, and then shifts to look towards the future of all of the cool nifty things we will be able to do with structured streaming.
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.
My talk about Catalyst for QCon Beijing 2015. In this talk, we walk through Catalyst, Spark SQL's query optimizer, by using a simplified version of Catalyst to build an optimizing Brainfuck compiler named Brainsuck in less than 300 lines of code.
This slide deck is used as an introduction to the internals of Apache Spark, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
Apache Spark is one of the most popular big data systems, but once the shiny finish starts to wear off you can find yourself wondering if you've accidentally deployed a Ford Pinto into production. This talk will look at the challenges that come with scaling Spark jobs. Also, the talk will explore Spark's new(ish) Dataset/DataFrame API, as well as how it’s evolving in Spark 2.3 with improved Python support.
If you're already a Spark user, come to find out why it’s not all your fault. If you aren't already a Spark user, come to find out how to save yourself from some of the pitfalls once you move beyond the example code.
Check out Holden's newest book, High Performance Spark, for more information!
From https://niketechtalksjan2018.splashthat.com/
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
Slides from: https://www.meetup.com/Sydney-Apache-Spark-User-Group/events/246892684/
Welcome to the first Sydney Spark Meetup in 2018!
We are very glad to have an visiting Apache Spark committer Holden Karau to give a talk on streaming machine learning. Title: Streaming ML w/Spark (and why it's a bit painful today & #workingonit)
Apache Spark is one of the most popular distributed systems, and it has built in libraries for both machine learning and streaming. This talk will cover Spark's two streaming libraries, look at the future, and how to make streaming ML work today (for both serving and prediction). If you aren't familiar with Spark, that's ok! We'll spend the first ~5 minutes covering just enough to get through the rest of the talk, and for those of you already familiar you can spend those ~5 minutes downloading the sample code :)
About Holden:
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
• What to bring
• Important to know
A couple of us will be at the doors of 60 Margaret St to let people in until 6.10pm.
A super fast introduction to Spark and glance at BEAMHolden Karau
Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM.
Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support).
Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/
Summary of the lessons we learned with Docker (Dockerfile, storage, distributed networking) during the first iteration of the AdamCloud project (Fall 2014).
The AdamCloud project (part I) was presented here:
http://www.slideshare.net/davidonlaptop/bdm29-adamcloud-planification
The MathWorks introduced MATLAB support for HDF5 in 2002 via three high-level functions: HDF5INFO, HDF5READ, and HDF5WRITE. These functions worked well for their purpose-providing simple interfaces to a complicated file format-but MATLAB users requested finer control over their HDF5 files and the HDF5 library. MATLAB 7.3 (R2006b) adds this precise level of support for version 1.6.5 of the HDF5 library via a close mapping of the HDF5 C API to MATLAB function calls.
This presentation will briefly introduce the earlier, high-level HDF5 interface (and its limitations) before showing in detail the low-level HDF5 functions. It will show how to interact with the HDF5 library and files using the thirteen classes of functions in MATLAB, which encapsulate groupings of functionality found in the HDF5 C API. But because MATLAB is itself a higher-level language than C, we will also present MATLAB's extensions and modifications of the HDF5 C API that make it more MATLAB-like, work with defined values, and perform ID and memory management.
Wrapping a library like HDF5 requires a great deal of effort and design, and we will briefly present a general-purpose mechanism for creating close mappings between library interfaces and an application like MATLAB. One of our goals in this presentation is to facilitate communication with The HDF Group about how The MathWorks builds our HDF5 interfaces in order to ease adoption of future versions of the HDF5 library in large, general-purpose applications.
PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
A very short set of slides to describe an RDD data structure.
Extracted from my 3-day course: www.sparkInternals.com
There is also a video of this on YouTube: http://youtu.be/odcEg515Ne8
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
This talk starts with a focus on "How to not make Spark Explode" as a developer, and then shifts to look towards the future of all of the cool nifty things we will be able to do with structured streaming.
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.
My talk about Catalyst for QCon Beijing 2015. In this talk, we walk through Catalyst, Spark SQL's query optimizer, by using a simplified version of Catalyst to build an optimizing Brainfuck compiler named Brainsuck in less than 300 lines of code.
This slide deck is used as an introduction to the internals of Apache Spark, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
Apache Spark is one of the most popular big data systems, but once the shiny finish starts to wear off you can find yourself wondering if you've accidentally deployed a Ford Pinto into production. This talk will look at the challenges that come with scaling Spark jobs. Also, the talk will explore Spark's new(ish) Dataset/DataFrame API, as well as how it’s evolving in Spark 2.3 with improved Python support.
If you're already a Spark user, come to find out why it’s not all your fault. If you aren't already a Spark user, come to find out how to save yourself from some of the pitfalls once you move beyond the example code.
Check out Holden's newest book, High Performance Spark, for more information!
From https://niketechtalksjan2018.splashthat.com/
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
Slides from: https://www.meetup.com/Sydney-Apache-Spark-User-Group/events/246892684/
Welcome to the first Sydney Spark Meetup in 2018!
We are very glad to have an visiting Apache Spark committer Holden Karau to give a talk on streaming machine learning. Title: Streaming ML w/Spark (and why it's a bit painful today & #workingonit)
Apache Spark is one of the most popular distributed systems, and it has built in libraries for both machine learning and streaming. This talk will cover Spark's two streaming libraries, look at the future, and how to make streaming ML work today (for both serving and prediction). If you aren't familiar with Spark, that's ok! We'll spend the first ~5 minutes covering just enough to get through the rest of the talk, and for those of you already familiar you can spend those ~5 minutes downloading the sample code :)
About Holden:
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
• What to bring
• Important to know
A couple of us will be at the doors of 60 Margaret St to let people in until 6.10pm.
A super fast introduction to Spark and glance at BEAMHolden Karau
Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM.
Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support).
Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/
Summary of the lessons we learned with Docker (Dockerfile, storage, distributed networking) during the first iteration of the AdamCloud project (Fall 2014).
The AdamCloud project (part I) was presented here:
http://www.slideshare.net/davidonlaptop/bdm29-adamcloud-planification
GSoC2014 - Uniritter Presentation May, 2015Fabrízio Mello
This presentation is about the work that I did during the Google Summer of Code 2014 to PostgreSQL. The project is about change an Unlogged Table to Logged and vice-versa. Project wiki page: https://wiki.postgresql.org/wiki/Allow_an_unlogged_table_to_be_changed_to_logged_GSoC_2014
I present this work to Uniritter IT students in Canoas/RS (2015-05-18) and Porto Alegre/RS (FAPA - 2015-05-20).
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
21 people attended the July 2014 program meeting hosted by BDPA Cincinnati chapter. The topic was 'Open Source Tools and Resources'. The guest speaker was Greg Greenlee (Blacks In Technology).
'Open source' refers to a computer program in which the source code is available to the general public for use or modification from its original design. Open source code is typically created as a collaborative effort in which programmers improve upon the code and share the changes within the community. Open source sprouted in the technological community as a response to proprietary software owned by corporations. Over 85% of enterprises are using open source software. Managers are quickly realizing the benefit that community-based development can have on their businesses. This month, we put on our geek hats and detective gloves to learn how we can monitor our computers’ environments using open source tools. This meetup covered some of the most popular ‘Free and Open Source Software’ (FOSS) tools used to monitor various aspects of your computer environment.
Level 101 for Presto: What is PrestoDB?Ali LeClerc
Presto is a widely adopted federated SQL engine for federated querying across multiple data sources. With Presto, you can perform ad hoc querying of data in place. For today’s “data hacker”, Presto helps solve challenges around time to discovery and the amount of time it takes to do ad hoc analysis.
In Level 101, you’ll get an overview of Presto, including:
A high level overview of Presto & most common use cases
The problems it solves and why you should use it
A live, hands-on demo on getting Presto running on Docker
Real world example: How Twitter uses Presto at scale
Kernel Recipes 2014 - Performance Does MatterAnne Nicolas
Deploying clouds is in everybody’s mind but how to make an efficient deployment ?
After setting up the hardware, it’s mandatory to make a deep inspection of server’s performance.
In a farm of supposed identical servers, many miss-{installation|configuration} could seriously degrade performance. If you want to discovery such counter-performance before users complains of their VMs, you have to be detect them before installing any software. Another performance metric to know is “how many VMs could I load on top of my servers ?”. By using the same methodology it is possible the compare how a set of VMs performs regarding the bare metal capabilities.
The challenge is here: How do detect automatically servers that under perform ? How to insure that a new server entering a farm will not degrade it ? How to measure the overhead of all the virtualization layers from the VM point of view ?
Erwan Velu – Performance Engineer @eNovance
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
Presentations from the Cloudera Impala meetup on Aug 20 2013:
- Nong Li on Parquet+Impala and UDF support
- Henry Robinson on performance tuning for Impala
Design and Implementation of the Security Graph LanguageAsankhaya Sharma
Today software is built in fundamentally different
ways from how it was a decade ago. It is increasingly common
for applications to be assembled out of open-source components,
resulting in the use of large amounts of third-party code. This
third-party code is a means for vulnerabilities to make their
way downstream into applications. Recent vulnerabilities such
as Heartbleed, FREAK SSL/TLS, GHOST, and the Equifax data
breach (due to a flaw in Apache Struts) were ultimately caused
by third-party components. We argue that an automated way to
audit the open-source ecosystem, catalog existing vulnerabilities,
and discover new flaws is essential to using open-source safely.
To this end, we describe the Security Graph Language (SGL), a
domain-specific language for analysing graph-structured datasets
of open-source code and cataloguing vulnerabilities. SGL allows
users to express complex queries on relations between libraries
and vulnerabilities in the style of a program analysis language.
SGL queries double as an executable representation for vulnerabilities, allowing vulnerabilities to be automatically checked
against a database and deduplicated using a canonical representation. We outline a novel optimisation for SGL queries based on
regular path query containment, improving query performance up to 3 orders of magnitude. We also demonstrate the
effectiveness of SGL in practice to find zero-day vulnerabilities
by identifying sever
In this talk I'll discuss how we can combine the power of PostgreSQL with TensorFlow to perform data analysis. By using the pl/python3 procedural language we can integrate machine learning libraries such as TensorFlow with PostgreSQL, opening the door for powerful data analytics combining SQL with AI. Typical use-cases might involve regression analysis to find relationships in an existing dataset and to predict results based on new inputs, or to analyse time series data and extrapolate future data taking into account general trends and seasonal variability whilst ignoring noise. Python is an ideal language for building custom systems to do this kind of work as it gives us access to a rich ecosystem of libraries such as Pandas and Numpy, in addition to TensorFlow itself.
And introdution to MR and Hadoop and an view on the opportunities to use MR with databases i.e., SQL-MapReduce by Teradata and In-database MR by Oracle.
The presentation was used during a class of Datenbanken Implementierungstechniken in 2013.
Challenges and patterns for semantics at scaleRob Vesse
Discusses some of the challenges around applying semantics at scale (tens of billions of triples and larger). Describes some of the patterns that can be used to meet those challenges.
Similar to OQGraph at MySQL Users Conference 2011 (20)
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.