Solr is a great tool to have in the data scientist toolbox. In this talk, I walk through several demos of using Solr to data science activities as well as explore various use cases for Solr and data science
In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine technology is also being used for practical, real time recommendations, events processing, complex spatial functionality and time series analysis capable of not only matching user's queries in text, but also driving real time decision making and analytics. In fact, open source Apache Lucene/Solr can do all of this and more by taking advantage of new data structures and algorithms that complement more traditional IR approaches. In this demo-driven talk, Lucene committer Grant Ingersoll will take a look at some of the new and exciting ways users are leveraging Lucene/Solr and related technology to drive deeper insight into information needs that go beyond keywords in a text box.
A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanations
Join Apache Solr committer and Lucidworks engineer Tim Potter for a webinar to learn how to unlock and understand your big data - and get the most out of your Hadoop investment.
http://sigir2013.ie/industry_track.html#GrantIngersoll
Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.
In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine technology is also being used for practical, real time recommendations, events processing, complex spatial functionality and time series analysis capable of not only matching user's queries in text, but also driving real time decision making and analytics. In fact, open source Apache Lucene/Solr can do all of this and more by taking advantage of new data structures and algorithms that complement more traditional IR approaches. In this demo-driven talk, Lucene committer Grant Ingersoll will take a look at some of the new and exciting ways users are leveraging Lucene/Solr and related technology to drive deeper insight into information needs that go beyond keywords in a text box.
A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanations
Join Apache Solr committer and Lucidworks engineer Tim Potter for a webinar to learn how to unlock and understand your big data - and get the most out of your Hadoop investment.
http://sigir2013.ie/industry_track.html#GrantIngersoll
Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.
LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following:
- A powerful UI to analyze time series data stored in Lucene/Solr
- Creating and sharing visualizations, dashboards and reports
- Discovery and analysis of data coming from servers, applications, devices and more
- Exploration of click, geospatial and social data in ways previously unimaginable
Valentyn Kropov, Big Data Solutions Architect has recently attended "Hadoop World / Strata" – biggest and coolest Big Data conference in a World, and he can't wait to share fresh trends and topics straight from New-York. Come and learn how Hadoop cluster will help NASA to explore Mars, how Netflix build 10PB platform, what are the latest trends in Spark, to learn about newest, just announced storage engine from Cloudera called Kudu and many many more interesting stuff.
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is the story of the Global Patent Search Network, the next generation multilingual search platform for the USPTO. GPSN, http://gpsn.uspto.gov, was the first public application deployed in the cloud, and allowed a very small development team to build a discovery interface across millions of patents.
This case study will cover:
• How we leveraged Amazon Web Services platform for data ingestion, auto scaling, and deployment at a very low price compared to traditional data centers.
• We will cover some of the innovative methods for converting XML formatted data to usable information.
• Parsing through 5 TB of raw TIFF image data and converting them to modern web friendly format.
• Challenges in building a modern Single Page Application that provides a dynamic, rich user experience.
• How we built “data sharing” features into the application to allow third party systems to build additional functionality on top of GPSN.
With Search, developers and data engineers can run more relevant and responsive queries on the data in Hadoop and integrate with external tools to build custom real-time applications.
Chris Bradford & Matt Overstreet review several Cassandra use cases we’ve encountered in state and federal government. C* solves many big data problems when storing, enriching and improving access to data.
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
The majority of a data scientist’s time is spent cleaning and organizing data before insights can be derived. Frequently, with large datasets, a lack of integration with visualization tools makes it hard to know what’s most interesting in the data and also creates challenges for validating numerical insights from models. Given the vast number of tools available in the ecosystem, it is hard to experiment with different tools to pick the most suitable one, especially given the complexity involved in integrating them with one’s solution.
The speakers will present an easy to use workflow that solves this integration challenge by combining various open source libraries, databases (e.g. Hive, Postgres, MySQL, HBase etc.) and visualization with distributed analytics. Intel developed a highly scalable library built over Apache Spark with novel graph, statistical and machine learning algorithms that also enhances the user experience of Apache Spark via easier to use APIs.
This session will showcase how to address the above mentioned issues for a drug similarity use case. We’ll go from ETL operations on raw drug data to deriving relevant features from the drug’s chemical structure using statistical and graph algorithms, using techniques to identify best model and parameters for this data to derive insights, and then demonstrating the ease of connectivity to different databases and visualization tools.
LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following:
- A powerful UI to analyze time series data stored in Lucene/Solr
- Creating and sharing visualizations, dashboards and reports
- Discovery and analysis of data coming from servers, applications, devices and more
- Exploration of click, geospatial and social data in ways previously unimaginable
Valentyn Kropov, Big Data Solutions Architect has recently attended "Hadoop World / Strata" – biggest and coolest Big Data conference in a World, and he can't wait to share fresh trends and topics straight from New-York. Come and learn how Hadoop cluster will help NASA to explore Mars, how Netflix build 10PB platform, what are the latest trends in Spark, to learn about newest, just announced storage engine from Cloudera called Kudu and many many more interesting stuff.
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is the story of the Global Patent Search Network, the next generation multilingual search platform for the USPTO. GPSN, http://gpsn.uspto.gov, was the first public application deployed in the cloud, and allowed a very small development team to build a discovery interface across millions of patents.
This case study will cover:
• How we leveraged Amazon Web Services platform for data ingestion, auto scaling, and deployment at a very low price compared to traditional data centers.
• We will cover some of the innovative methods for converting XML formatted data to usable information.
• Parsing through 5 TB of raw TIFF image data and converting them to modern web friendly format.
• Challenges in building a modern Single Page Application that provides a dynamic, rich user experience.
• How we built “data sharing” features into the application to allow third party systems to build additional functionality on top of GPSN.
With Search, developers and data engineers can run more relevant and responsive queries on the data in Hadoop and integrate with external tools to build custom real-time applications.
Chris Bradford & Matt Overstreet review several Cassandra use cases we’ve encountered in state and federal government. C* solves many big data problems when storing, enriching and improving access to data.
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
The majority of a data scientist’s time is spent cleaning and organizing data before insights can be derived. Frequently, with large datasets, a lack of integration with visualization tools makes it hard to know what’s most interesting in the data and also creates challenges for validating numerical insights from models. Given the vast number of tools available in the ecosystem, it is hard to experiment with different tools to pick the most suitable one, especially given the complexity involved in integrating them with one’s solution.
The speakers will present an easy to use workflow that solves this integration challenge by combining various open source libraries, databases (e.g. Hive, Postgres, MySQL, HBase etc.) and visualization with distributed analytics. Intel developed a highly scalable library built over Apache Spark with novel graph, statistical and machine learning algorithms that also enhances the user experience of Apache Spark via easier to use APIs.
This session will showcase how to address the above mentioned issues for a drug similarity use case. We’ll go from ETL operations on raw drug data to deriving relevant features from the drug’s chemical structure using statistical and graph algorithms, using techniques to identify best model and parameters for this data to derive insights, and then demonstrating the ease of connectivity to different databases and visualization tools.
Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
Fusion 3.1 comes with exciting new features that will make your search more personal and better targeted. Join us for a webinar to learn more about Fusion's features, what's new in this release, and what's around the corner for Fusion.
Analytics in Search
Many companies including Lucidworks have embraced the Kibana open source code to add visualization and analytics to enhance search management. Ravi Krishnamurthy , VP of Professional Services at Lucidworks, will show Silk, Lucid's implementation of Kibana, which provides all the capabilities of the open source code but adds enterprise-critical capabilities like authentication and security to protect restricted content.
Search engines, and Apache Solr in particular, are quickly shifting the focus away from “big data” systems storing massive amounts of raw (but largely unharnessed) content, to “smart data” systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical.
Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solr’s free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly).
Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, we’ll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solr’s smart data capabilities: bi-directional integration of Apache Spark and Solr’s capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworks’ own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box.
We’ll dive into how all of these capabilities can fit within your data science toolbox, and you’ll come away with a really good feel for how to build highly relevant “smart data” applications leveraging these key technologies.
Presented on Tuesday, August 7, at the 2018 LRCN (Librarians' Registration Council of Nigeria) National Workshop on Electronic Resource Management Systems in Libraries, held at the University of Nigeria, Nsukka, Enugu State, Nigeria
This presentation was given in one of the DSATL Mettups in March 2018 in partnership with Southern Data Science Conference 2018 (www.southerndatascience.com)
War stories from building the Global Patent Search Network, and why Data folks need to think more about UX and Discovery, and UX folks need to think more about Data.
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Webinar: Site Search in an Hour with FusionLucidworks
Using Lucidworks View and Fusion 3, you can easily build and deploy site search in less than one hour. Even with multiple data sources, data transformations, and user interface development, a full enterprise search project can be completed in just an hour compared to the usual 6 months.
* Open source search with Solr/Lucene gives you the power to turn a wide range of information into fast, useful, relevant results!
* LucidWorks for Solr gives you a tested, release-stable certified distribution of open source search with enhanced tools and installation for building search apps quickly and reliably.
http://www.lucidimagination.com/How-We-Can-Help/webinar-from-search-to-found
Building Search & Recommendation EnginesTrey Grainger
In this talk, you'll learn how to build your own search and recommendation engine based on the open source Apache Lucene/Solr project. We'll dive into some of the data science behind how search engines work, covering multi-lingual text analysis, natural language processing, relevancy ranking algorithms, knowledge graphs, reflected intelligence, collaborative filtering, and other machine learning techniques used to drive relevant results for free-text queries. We'll also demonstrate how to build a recommendation engine leveraging the same platform and techniques that power search for most of the world's top companies. You'll walk away from this presentation with the toolbox you need to go and implement your very own search-based product using your own data.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Intro talk for UNC School of Information and Library Science. Covers basics of Lucene and Solr as well as info on Lucene/Solr jobs, opportunities, etc.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
4. Solr in a nutshell
8M+ total
downloads
Solr is both established & growing
250,000+
monthly downloads
Largest community of developers.
2500+open Solr jobs.
Solr most widely used search
solution on the planet.
Lucidworks
Unmatched Solr expertise.
1/3
of the active
committers
70%
of the open source
code is committed
Lucene/Solr Revolution
world’s largest open source user
conference dedicated to Lucene/Solr.
Solr has tens of thousands
of applications in production.
You use
Solr everyday.
5. Solr’s Key Features
• Full text search (Info Retr.)
• Facets/Guided Nav galore!
• Lots of data types
• Spelling, auto-complete,
highlighting
• Cursors
• More Like This
• De-duplication
• Apache Lucene
• Grouping and Joins
• Stats, expressions,
transformations and more
• Lang. Detection
• Extensible
• Massive Scale/Fault tolerance
6.
7. It is increasingly important to know
what is important!
Corollary: The faster you know what is important, the better
11. • Feature Selection
• Analyzers for all types
• Easily get weights for terms
• Term Vectors
• Data Reduction
• Filters
• Analyzers
• Data quality tools
Feature Selection and Data Reduction
12. • Quick and dirty:
• kNN, others
• Carrot^2 integration for search result
clustering
• Integration with Mahout
• Lucene provides Bayesian classifiers
built on index
• Easily build training and test sets via
filter queries
Classification and Clustering
13. • Built in expressions, stats, function
queries make custom ranking a snap!
• Search is essentially vector * matrix
• Lucene index is a ranking optimized
matrix
• More coming!
Math
14. Clicks, tweets, ratings, locations and much more can all
be leveraged to provide high quality recommendations
to users and deeper insight for data scientists
!
Signals power relevance
Query Modification
Increase the findability of
documents and records with
automatic creation of tags, fields
and meta-data
Curate the user experience in
your application with artificial
result ranking, document
injections and obfuscation
Result ManipulationIndex Time Enrichment
Perform real time decision
making and routing in order to
map a users intention or
enterprise policy
16. • Data ingest:
• JSON, CSV, XML, Rich types (PDF, etc.), custom
• Clients for Python, R, Java, .NET and more
• http://cran.r-project.org/web/packages/solr/index.html, amongst
others
• Output formats: JSON, CSV, XML, custom
Solr and Your Tools
17. • Vector Space or Probabilistic, it’s your choice!
• Killer FST
• Wicked fast
• Pluggable compression, queries, indexing and
more
• Advanced Similarity Models
• Lang. Modeling, Divergence from Random,
more
• Easy to plug-in ranking
for Data Science