Presentation used at the Hippo meetup about enterprise search which took place in Amsterdam. The talk started with a general introduction about search with lucene, scaling with Solr and the distributed problems that elasticsearch successfully addresses.
Talk about add proxy user in Spark Task execution time given in Spark Summit East 2017 by Jorge López-Malla and Abel Ricon
full video:
https://www.youtube.com/watch?v=VaU1xC0Rixo&feature=youtu.be
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the "next generation" features
Most applications will need to communicate with other services or devices at some point, or at least save settings on the host computer. These concepts are covered in this module.
After introducing the generic concept behind devices, short examples show how to use files.
Afterwards, the module covers networking and its representation in Qt. In addition to providing classes for handling low level sockets, network managers simplify handling web service requests and responses like for the HTTP protocol. At the end, a short section explains the basics of different methods of parsing XML in Qt, including DOM trees, SAX, pull parsing and XQuery/XPath.
A section about internationalization demonstrates the process step-by-step, showing all required components to make your application multi-lingual.
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Continuing where module 2 left off, this part of the course explains signals and slots in more detail and tells you how to extend functionality of existing widgets by subclassing them. In real applications, widgets are often used in dialogs or inside the main window, which is a container for widgets and by default supports menus, toolbars and actions. These topics are all demonstrated via small examples.
Talk about add proxy user in Spark Task execution time given in Spark Summit East 2017 by Jorge López-Malla and Abel Ricon
full video:
https://www.youtube.com/watch?v=VaU1xC0Rixo&feature=youtu.be
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the "next generation" features
Most applications will need to communicate with other services or devices at some point, or at least save settings on the host computer. These concepts are covered in this module.
After introducing the generic concept behind devices, short examples show how to use files.
Afterwards, the module covers networking and its representation in Qt. In addition to providing classes for handling low level sockets, network managers simplify handling web service requests and responses like for the HTTP protocol. At the end, a short section explains the basics of different methods of parsing XML in Qt, including DOM trees, SAX, pull parsing and XQuery/XPath.
A section about internationalization demonstrates the process step-by-step, showing all required components to make your application multi-lingual.
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Continuing where module 2 left off, this part of the course explains signals and slots in more detail and tells you how to extend functionality of existing widgets by subclassing them. In real applications, widgets are often used in dialogs or inside the main window, which is a container for widgets and by default supports menus, toolbars and actions. These topics are all demonstrated via small examples.
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
This module explains several additional important concepts. These include properties of QObjects, data types, QString and various list types.
Special classes in Qt provide even more convenient APIs if you want to save settings in the right way for the target platform.
At the end, a guide walks you through what you need to know about embedding files and resources into your application.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Know your platform. 7 things every scala developer should know about jvmPawel Szulc
Your Scala code can be cohesive, beautiful and fully functional. But at the end of the day, it runs on the JVM - powerful platform which also has its limits.
Is this fact fully transparent to a Scala developer or does basic understanding of the platform can be beneficial? Can we squeeze full potential out of our code? Are we aware of limits of our runtime environment? This talk will try to answer those questions.
we will show how Scala code is transformed to bytecode and what that implies
we will try to answer the question why @tailrec matters
we will look at the organization of the JVM memory
we will understand how different GC algorithms work
we will see if different GC algorithm can change performance of our code
we will look at toolkits located at JAVA_HOME/bin/
we will try to scratch a surface of JIT :)
This is basic introduction to the topic. We will not show you how compiler works internally :) But we will try to give you general overview of internals of the platform that you use on a daily basis.
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
This talk starts with a focus on "How to not make Spark Explode" as a developer, and then shifts to look towards the future of all of the cool nifty things we will be able to do with structured streaming.
Summary of JDK10 and What will come into JDK11なおき きしだ
Newer version is here
https://www.slideshare.net/nowokay/summary-of-jdk10-and-what-will-come-into-jdk11-99363835
Summary of JDK10
and What will come into JDK11 so far
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside
Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “next
generation” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document
& Value pairs in a column stride fashion either entirely memory resident random access or disk
resident iterator based without the need to un-invert fields. Its final goal is to provide a
independently update-able per document storage for scoring, sorting or even filtering. This talk will
introduce the current state of development, implementation details, its features and how DocValues
have been integrated into Lucene’s Codec API for full extendability.
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
This module explains several additional important concepts. These include properties of QObjects, data types, QString and various list types.
Special classes in Qt provide even more convenient APIs if you want to save settings in the right way for the target platform.
At the end, a guide walks you through what you need to know about embedding files and resources into your application.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Know your platform. 7 things every scala developer should know about jvmPawel Szulc
Your Scala code can be cohesive, beautiful and fully functional. But at the end of the day, it runs on the JVM - powerful platform which also has its limits.
Is this fact fully transparent to a Scala developer or does basic understanding of the platform can be beneficial? Can we squeeze full potential out of our code? Are we aware of limits of our runtime environment? This talk will try to answer those questions.
we will show how Scala code is transformed to bytecode and what that implies
we will try to answer the question why @tailrec matters
we will look at the organization of the JVM memory
we will understand how different GC algorithms work
we will see if different GC algorithm can change performance of our code
we will look at toolkits located at JAVA_HOME/bin/
we will try to scratch a surface of JIT :)
This is basic introduction to the topic. We will not show you how compiler works internally :) But we will try to give you general overview of internals of the platform that you use on a daily basis.
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
This talk starts with a focus on "How to not make Spark Explode" as a developer, and then shifts to look towards the future of all of the cool nifty things we will be able to do with structured streaming.
Summary of JDK10 and What will come into JDK11なおき きしだ
Newer version is here
https://www.slideshare.net/nowokay/summary-of-jdk10-and-what-will-come-into-jdk11-99363835
Summary of JDK10
and What will come into JDK11 so far
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside
Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “next
generation” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document
& Value pairs in a column stride fashion either entirely memory resident random access or disk
resident iterator based without the need to un-invert fields. Its final goal is to provide a
independently update-able per document storage for scoring, sorting or even filtering. This talk will
introduce the current state of development, implementation details, its features and how DocValues
have been integrated into Lucene’s Codec API for full extendability.
This talk is one that I gave to the HPTS workshop in Asilomar in 2009. It describes the ideas behind micro-sharding and outlines how Katta can manage micro-shards.
Some builds and spacing are off because this was exported as power point from Keynote.
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth.
In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting.
Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution.
Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.
A talk given by Ted Dunning in the HPTS workshop in Asilomar in 2009. It describes the ideas behind micro-sharding and outlines how Katta can manage micro-shards.
Some builds and spacing are off because this was exported as power point from Keynote.
Solr Exchange: Introduction to SolrCloudthelabdude
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application.
How SolrCloud Changes the User Experience In a Sharded Environmentlucenerevolution
Presented by Erick Erickson, Lucid Imagination - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
The next major release of Solr (4.0) will include "SolrCloud", which provides new distributed capabilities for both in-house and externally-hosted Solr installations. Among the new capabilities are: Automatic Distributed Indexing, High Availability and Failover, Near Real Time searching and Fault Tolerance. This talk will focus, at a high level, on how these new capabilities impact the design of Solr-based search applications primarily from infrastructure and operational perspectives.
Klepsydra is based in lock-free programming. This kind of programming is the high-level wrapper of the atomic operations in the processor, in particular the so-call compare-and-swap or CAS operation. It was invented back in the 70s, but it didn’t really become popular until the early 90s, when it was implemented in higher-level languages and then it really took off when Java included it in the early 2010s.
Lock-free programming consists of attempting repeatedly to write data in a small piece of memory until the data is in a consistent state. This is usually depicted as a plane trying to land in a busy airport. If the runway is busy, it flies away and then tries it once and again until success.
This technique is substantially lighter than the traditional mutex operation, because it is just specific to a one small piece of memory, as opposed to mutex, which blocks a big portion of the memory. That is why the traditional lock systems are not deterministic, while lock-free systems are more granular and deterministic. It works remarkably more efficient.
Our own ring buffer is wrapped in what we call the Klepsydra SDK. It is just another library that is installed in the operating system that receives data from multiple sources. It is essentially a memory sharing system.
https://klepsydra.com/klepsydra-ros-2-executor-a-ring-buffer-to-rule-them-all/
To date, Hadoop usage has focused primarily on offline analysis--making sense of web logs, parsing through loads of unstructured data in HDFS, etc. But what if you want to run map/reduce against your live data set without affecting online performance? Combining Hadoop with Cassandra's multi-datacenter replication capabilities makes this possible. If you're interested in getting value from your data without the hassle and latency of first moving it into Hadoop, this talk is for you. I'll show you how to connect all the parts, enabling you to write map/reduce jobs or run Pig queries against your live data. As a bonus I'll cover writing map/reduce in Scala, which is particularly well-suited for the task.
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
This session covers our experience with using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.We will start by surveying the current Cassandra analytics landscape, including Hadoop and HIVE, and touch on the use of custom input formats to extract data from Cassandra. We will then dive into Spark and Shark, two memory-based cluster computing frameworks, and how they enable often dramatic improvements in query speed and productivity, over the standard solutions today.
Abstract –
Spark 2 is here, while Spark has been the leading cluster computation framework for severl years, its second version takes Spark to new heights. In this seminar, we will go over Spark internals and learn the new concepts of Spark 2 to create better scalable big data applications.
Target Audience
Architects, Java/Scala developers, Big Data engineers, team leaders
Prerequisites
Java/Scala knowledge and SQL knowledge
Contents:
- Spark internals
- Architecture
- RDD
- Shuffle explained
- Dataset API
- Spark SQL
- Spark Streaming
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Hippo meetup: enterprise search with Solr and elasticsearch
1. 15th January 2013 – Hippo meetup
Luca Cavanna
Software developer & Search consultant at Trifork Amsterdam
luca.cavanna@trifork.nl - @lucacavanna
2. Trifork (aka Jteam/Dutchworks/Orange11)
Focus areas:
– Big data & Search
– Mobile
– Custom solutions
– Knowledge (GOTO Amsterdam)
● Hippo partner
● Hippo related search projects:
– uva.nl
– working on rijksoverheid.nl
3. Agenda
● Search introduction
– Lucene foundation
– Why do we need Solr or elasticsearch?
● Scaling with Solr
● Elasticsearch distributed nature
● Elasticsearch features
4. Apache Lucene
● High-performance, full-featured text search engine
library written entirely in Java
● It indexes documents as collections of fields
● A field is a string based key-value pair
● What data structure does it use under the hood?
5. Inverted index
term freq Posting list
1 The old night keeper keeps the keep in the town and 1 6
big 2 23
2 In the big old house in the big old gown.
dark 1 6
3 The house in the town had the big old keep did 1 4
grown 1 2
4 Where the old night keeper never did sleep.
had 1 3
house 2 23
5 The night keeper keeps the keep in the night
in 5 12356
6 And keeps in the dark and sleeps in the light. keep 3 135
keeper 3 145
keeps 3 156
light 1 6
never 1 4
night 3 145
old 4 1234
sleep 1 4
sleeps 1 6
the 6 123456
town 2 13
where 1 4
6. Inverted index
● Indexing
– Text analysis
● Tokenization, lowercasing and more
● The inverted index can contain more data
– Term offsets and more
● The inverted index itself doesn't contain the text for
displaying the search results
7. Indexing
● Lucene writes indexes as segments
● Segments are not modifiable: Write-Once
● Each segment is a searchable mini index
● Each segment contains
– Inverted index
– Stored fields
– ...and more
8. Indexing: the commit operation
● Documents are searchable only after a commit!
● Commit gives also durability
● The most expensive operation in Lucene!!!
9. Near-real-time search (since Lucene 2.9, exposed in Solr 4.0)
● With the Lucene near-real time API you don't need a
commit to make new documents searchable
● Less expensive than commit
● Doesn't guarantee durability though
● Exposed as soft commit in Solr 4.0
10. Lucene code example – indexing data
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40,
new StandardAnalyzer(Version.LUCENE_40));
Directory directory = FSDirectory.open(new File("data"));
IndexWriter writer = new IndexWriter(directory, config);
Document document = new Document();
FieldType idFieldType = new FieldType();
idFieldType.setIndexed(true);
idFieldType.setStored(true);
idFieldType.setTokenized(false);
document.add(new Field("id","id-1", idFieldType));
FieldType titleFieldType = new FieldType();
titleFieldType.setIndexed(true);
titleFieldType.setStored(true);
document.add(new Field("title","This is the title", titleFieldType));
FieldType descriptionFieldType = new FieldType();
descriptionFieldType.setIndexed(true);
document.add(new Field("description","This is the description", descriptionFieldType));
writer.addDocument(document);
writer.close();
11. Lucene code example – querying and showing results
QueryParser queryParser = new QueryParser(Version.LUCENE_40, "title",
new StandardAnalyzer(Version.LUCENE_40));
Query query = queryParser.parse(queryAsString);
Directory directory = FSDirectory.open(new File("data"));
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
TopDocs topDocs = indexSearcher.search(query, 10);
System.out.println("Total hits: " + topDocs.totalHits);
for (ScoreDoc hit : topDocs.scoreDocs) {
Document document = indexSearcher.doc(hit.doc);
for (IndexableField field : document) {
System.out.println(field.name() + ": " + field.stringValue());
}
}
12. What's missing?
● A common way to represent documents
● Interface to send document to (HTTP)
● A way to represent queries
● Interface to send queries to (HTTP)
● Configuration
● Caching
● Distributed infrastructure
● And more....
14. Scaling – why?
‣ The more concurrent searches you run, the slower they
get
‣ Indexing and searching on the same machine will
substantially harm search performance
‣ Segment merging may be CPU/IO intensive
operations
‣ Disk cache invalidation
‣ Fail over
16. Solr replication (pull approach)
• Master-slave based solution
• Single machine for indexing data (master)
• Multiple machines for querying (slaves)
• Master is not aware of the slaves
• Slave is aware of the master
• Load balancer responsible for balancing the query
requests
• What about real-time search? No way!
17. SolrCloud
• A set of new distributed capabilities in Solr
• uses Apache Zookeeper as a system of record for
the cluster state, for central configuration, and for
leader election
• Whatever server (shard) you send data to:
• the documents get distributed over the shards
• A shard can be a leader or a replica and contains a
subset of the data
• Easily scale up adding new Solr nodes
18. elasticsearch
● Distributed search engine built on top of Lucene
● Apache 2 license
● Written in Java
● RESTful
● Created and mainly developed by Shay Banon
● A company behind it: elasticsearch.com
● Regular releases
– Latest release 0.20.2
19. elasticsearch
● Schemaless
– Uses defaults and automatic type guessing
– Custom mappings may be defined if needed
● JSON oriented
● Multi tenancy
– Multiple indexes per node, multiple types per index
● Designed to be distributed from the beginning
● Almost everything is available as API (including
configuration)
● Wide range of administration APIs
20. elasticsearch distributed terminology
● Node: a running instance of elasticsearch which belongs
to a cluster (usually one node per server)
● Cluster: one or more nodes with the same cluster name
● Shard: a single Lucene instance. A low-level worker unit
managed by elasticsearch. An index is split into one or
more shards.
● Index: a logical namespace which points to one or more
shards
– Your code won't deal directly with a shard, only with
an index
– But an index is composed of more lucene indexes
(one per shard)
21. elasticsearch distributed terminology
● More shards:
– improve indexing performance
– increase data distribution (depends on # of nodes)
– Watch out: each shard has a cost as well!
● More replicas:
– increase failover
– improve querying performance
22. Transaction Log
• Indexed docs are fully persistent
• No need for a Lucene IndexWriter#commit
• Managed using a transaction log / WAL
• Full single node durability (kill dash 9)
• Utilized when doing hot relocation of shards
• Periodically “flushed” (calling IW#commit)
• Durability and real time search together!
38. Indexing (Push) - ElasticSearch
• Documents added through push requests
• Full JSON Object representation of Documents supported
• Embedded objects
• 1st class Parent / Child and Versioning
• Near Realtime index refreshing available
• Realtime get supported {
"name": "Luca Cavanna",
"location": {
"city": "Amsterdam",
"country": "The Netherlands"
}
}
39. Indexing (Pull) - ElasticSearch
• Data flows from sources using ‘Rivers’
• Continues to add data as it ‘flows’
• Can be added, removed, configured dynamically
• Out-of-the-box support for CouchDB, Twitter (implemented by the es
team)
• Community implementations for DBs, other NoSQL and Solr
River
River
40. Searching - ElasticSearch
• Search request in Request Body
• Powerful and extensible Query DSL
• Separation of Query and Filters
• Named Filters allowing tracking of which Documents matched which
Filters
• By default storing the source of each document (_source field)
• Catch all feature enabled by default (_all field)
• Sorting of results
• Highlighting, Faceting, Boosting...and more
42. Thanks
There would be a lot more to say:
• Query DSL
• Scripting module (pluggable implementation)
• Percolator
• Running it embedded
Check them out yourself if you are interested!
Questions?