The document discusses various approaches for performing joins in Lucene/Solr, including query-time joins using JoinUtil and index-time joins using block joins. It proposes a new join index approach that stores join mappings in docvalues to enable faster querying compared to JoinUtil while also allowing incremental updates unlike block joins. The approach aims to address issues like slow querying in JoinUtil due to term enumeration and inability to reorder docs in block joins.
Presented by Martijn van Groningen, SearchWorkings - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
In the real world data isn’t flat. Data is often modelled into complex models. Lucene is document oriented and doesn’t support relations natively. The only way you could index this data is by de-normalizing the relations in a document with many fields and execute subsequent queries. Subsequent queries can be expensive and data gets duplicated. This isn’t always ideal. Recently Solr and Lucene provide features that allow you to join and group. You can join and group on fields across documents and still have the power of Lucene’s awesome free text search. In this presentation, we’ll look at these new alternatives, the advantages and disadvantages and how these features can be utilized. how these new capabilities impact the design of Solr-based search applications primarily from infrastructure and operational perspectives.
Approaching Join Index - Lucene/Solr Revolution 2014Grid Dynamics
This presentation was provided at the Lucene/Solr Revolution conference in Washington, DC in 2014 by Mikhail Khludnev and discusses how to work with Joins and opportunities for improving query-time joins.
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Presented by Martijn van Groningen, SearchWorkings - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
In the real world data isn’t flat. Data is often modelled into complex models. Lucene is document oriented and doesn’t support relations natively. The only way you could index this data is by de-normalizing the relations in a document with many fields and execute subsequent queries. Subsequent queries can be expensive and data gets duplicated. This isn’t always ideal. Recently Solr and Lucene provide features that allow you to join and group. You can join and group on fields across documents and still have the power of Lucene’s awesome free text search. In this presentation, we’ll look at these new alternatives, the advantages and disadvantages and how these features can be utilized. how these new capabilities impact the design of Solr-based search applications primarily from infrastructure and operational perspectives.
Approaching Join Index - Lucene/Solr Revolution 2014Grid Dynamics
This presentation was provided at the Lucene/Solr Revolution conference in Washington, DC in 2014 by Mikhail Khludnev and discusses how to work with Joins and opportunities for improving query-time joins.
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
OrientDB: Unlock the Value of Document Data RelationshipsFabrizio Fortino
a) A general introduction of graph databases and OrientDB,
b) Why connected data has more value than just data,
c)How to "have fun" with OrientDB combining documents with graphs via SQL,
d) A use case on how OrientDB has helped to raise standards in Irish Public Office.
On OrientDB: NOSQL document databases provide an elegant way to deal with data in different shapes enabling developers to create better and faster products quickly. The main goal of these systems is to find the most efficient solution to manage data itself. With the Big Data Explosion we need to deal with a myriad of highly interconnected information. The challenge now is not only on how to store data but on how to manage, analyse, traverse and use your data within the context of relationships. Graph databases shine at maintaining highly connected data and is the fastest growing category in database management systems: 2014 registered an increase of 250% in terms of adoption and Forrester Research predicts that more than a quarter of enterprises will be using graphs by 2017. OrientDB combines more than one NOSQL model offering the unique flexibility of modelling data in the form of either documents, or graphs, while incorporating object oriented programming as a way of encapsulating relationships.
In this presentation, we describe the underlying principles of the Semantic Web along with the core concepts and technologies, how they fit in with the Grails Framework and any existing tools, API\'s and Implementations.
JEEConf 2019 | Let’s build a Java backend designed for a high loadAlex Moskvin
We’ll consider a process of building specific Java backend application that must survive a high load.
During the talk we’ll analyse requirements, pick up a technical stack and then while discussing specific functionality go through typical major painful points that specifically Java developers are not taking into account that eventually result in inability to run such application at large scale, breach response times, crash via OOM and some others. Following the discussion about potential architectural flaws, we’ll also discuss recommended approaches, techniques and solutions for tackling those.
Iterative Methodology for Personalization Models OptimizationSonya Liberman
Describing the iterative process of optimizing personalized content recommendation models, starting from user preferences optimization to a spark-based large-scale modular machine learning framework.
OrientDB: Unlock the Value of Document Data RelationshipsFabrizio Fortino
a) A general introduction of graph databases and OrientDB,
b) Why connected data has more value than just data,
c)How to "have fun" with OrientDB combining documents with graphs via SQL,
d) A use case on how OrientDB has helped to raise standards in Irish Public Office.
On OrientDB: NOSQL document databases provide an elegant way to deal with data in different shapes enabling developers to create better and faster products quickly. The main goal of these systems is to find the most efficient solution to manage data itself. With the Big Data Explosion we need to deal with a myriad of highly interconnected information. The challenge now is not only on how to store data but on how to manage, analyse, traverse and use your data within the context of relationships. Graph databases shine at maintaining highly connected data and is the fastest growing category in database management systems: 2014 registered an increase of 250% in terms of adoption and Forrester Research predicts that more than a quarter of enterprises will be using graphs by 2017. OrientDB combines more than one NOSQL model offering the unique flexibility of modelling data in the form of either documents, or graphs, while incorporating object oriented programming as a way of encapsulating relationships.
In this presentation, we describe the underlying principles of the Semantic Web along with the core concepts and technologies, how they fit in with the Grails Framework and any existing tools, API\'s and Implementations.
JEEConf 2019 | Let’s build a Java backend designed for a high loadAlex Moskvin
We’ll consider a process of building specific Java backend application that must survive a high load.
During the talk we’ll analyse requirements, pick up a technical stack and then while discussing specific functionality go through typical major painful points that specifically Java developers are not taking into account that eventually result in inability to run such application at large scale, breach response times, crash via OOM and some others. Following the discussion about potential architectural flaws, we’ll also discuss recommended approaches, techniques and solutions for tackling those.
Iterative Methodology for Personalization Models OptimizationSonya Liberman
Describing the iterative process of optimizing personalized content recommendation models, starting from user preferences optimization to a spark-based large-scale modular machine learning framework.
This talk moves beyond the standard introduction into Elasticsearch and focuses on how Elasticsearch tries to fulfill its near-realtime contract. Specifically, I’ll show how Elasticsearch manages to be incredibly fast while handling huge amounts of data. After a quick introduction, we will walk through several search features and how the user can get the most out of the Elasticsearch. This talk will go under the hood exploring features like search, aggregations, highlighting, (non-)use of probabilistic data structures and more.
Building Bridges with Taxonomy: Enabling Semantic IntegrationDesign for Context
Taxonomies should be designed with enough flexibility and transition points to be a bridge to other taxonomies and datasets. Enabling your taxonomy to fit into the larger universe of partner companies, industry standards, federal requirements and complementary term sets gives it a solid foundation for future growth. We explore which vocabulary sets are available for reuse by the enterprise information architect and demonstrate how thinking about semantic integration from the beginning of the design process helps build a taxonomy that endures.
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Mikael Svenson
The query framework of SharePoint 2013 is a vast one, and it takes time to learn and master. In this session, you will get an overview of the latent capabilities with query rules and learn how you can maximize the use of query rules when building search driven pages using the Content by Search web part. The session is built around my blog series “SharePoint Search Queries Explained”
http://techmikael.blogspot.com/2014/03/sharepoint-search-queries-explained.html
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.
Druid is a high performance, column-oriented distributed data store that is widely used at Oath for big data analysis. Druid has a JSON schema as its query language, making it difficult for new users unfamiliar with the schema to start querying Druid quickly. The JSON schema is designed to work with the data ingestion methods of Druid, so it can provide high performance features such as data aggregations in JSON, but many are unable to utilize such features, because they not familiar with the specifics of how to optimize Druid queries. However, most new Druid users at Yahoo are already very familiar with SQL, and the queries they want to write for Druid can be converted to concise SQL.
We found that our data analysts wanted an easy way to issue ad-hoc Druid queries and view the results in a BI tool in a way that's presentable to nontechnical stakeholders. In order to achieve this, we had to bridge the gap between Druid, SQL, and our BI tools such as Apache Superset. In this talk, we will explore different ways to query a Druid datasource in SQL and discuss which methods were most appropriate for our use cases. We will also discuss our open source contributions so others can utilize our work. GURUGANESH KOTTA, Software Dev Eng, Oath and JUNXIAN WU, Software Engineer, Oath Inc.
20240518 - VixulCon 2024 - The Rise of PostgreSQL_ Historic Trends and Modern...Umair Shahid
Discover the journey of PostgreSQL from its origins at UC Berkeley to its status as the world’s most favored open-source database. This presentation will explore significant historical shifts towards open-source software in enterprise settings and PostgreSQL’s continuous evolution to meet changing technological standards. We will highlight its extensive feature set, accommodating a variety of use cases such as GIS, full text search, JSON, graph, time series, business intelligence, and vector storage. Additionally, we will examine why PostgreSQL’s versatility is highly valued by developers, offering ease of development along with improved performance and scalability.
A presentation on my early work on the Mastro system. Some of this research is now part of the ontop system, some evolved into more optimised forms (also in ontop).
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
Alluxio Tech Talk
Mar 12, 2019
Speaker:
Bin Fan, Alluxio
Matt Fuller, Starburst
As data analytic needs have increased with the explosion of data, the importance of the speed of analytics and the interactivity of queries has increased dramatically
In this tech talk, we will introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform. This stack provides a strong solution to run fast SQL across multiple storage systems including HDFS, S3, and others in public cloud, hybrid cloud, and multi-cloud environments.
You’ll learn about:
- The architecture of Presto, an open source distributed SQL engine, as well as innovations by Starburst like as it’s cost-based optimizer
- How Presto can query data from cloud object storage like S3 at high performance and cost-effectively with Alluxio
- How to achieve data locality and cross-job caching with Alluxio no matter where the data is persisted and reduce egress costs
In addition, we’ll present some real world architectures & use cases from internet companies like JD.com and NetEase.com running the Presto and Alluxio stack at the scale of hundreds of nodes.
qCube: Efficient integration of range query operators over a high dimension d...Rodrigo Rocha Silva
Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Sub-cube and Top-k Similar inquire query operators
Google BigQuery for Everyday DeveloperMárton Kodok
IV. IT&C Innovation Conference - October 2016 - Sovata, Romania
A. Every scientist who needs big data analytics to save millions of lives should have that power
Legacy systems don’t provide the power.
B. The simple fact is that you are brilliant but your brilliant ideas require complex analytics.
Traditional solutions are not applicable.
The Plan: have oversight over developments as they happen.
Goal: Store everything accessible by SQL immediately.
What is BigQuery?
Analytics-as-a-Service - Data Warehouse in the Cloud
Fully-Managed by Google (US or EU zone)
Scales into Petabytes
Ridiculously fast
Decent pricing (queries $5/TB, storage: $20/TB) *October 2016 pricing
100.000 rows / sec Streaming API
Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
Familiar DB Structure (table, views, record, nested, JSON)
Convenience of SQL + Javascript UDF (User Defined Functions)
Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
Client libraries available in YFL (your favorite languages)
Our benefits
no provisioning/deploy
no running out of resources
no more focus on large scale execution plan
no need to re-implement tricky concepts
(time windows / join streams)
pay only the columns we have in your queries
run raw ad-hoc queries (either by analysts/sales or Devs)
no more throwing away-, expiring-, aggregating old data.
Similar to Mikhail khludnev: approaching-join index for lucene (20)
Industry-leading brands have adapted to the ever-changing business environment and consumer expectations significantly faster than the competition. The key driving factor behind this speed-to-market is the adoption of intelligent cloud-native digital commerce architectures. In our talk, we will describe the best practices and case studies of achieving business agility that is imperative to excel in the post-pandemic era.
"Implementing data quality automation with open source stack" - Max Martynov,...Grid Dynamics
The quality of business decisions, machine learning insights, and executive reports depend on the quality and integrity of the underlying data. There are many ways that data can get corrupted in an analytical data platform from de-synchronization with the system-of-record to defects in data pipelines. We will show how to detect and prevent data corruption with automation, open-source tools, and machine learning.
"How to build cool & useful voice commerce applications (such as devices like...Grid Dynamics
In this talk, we describe how to build conversational e-commerce applications for the growing market of voice-powered AI devices using Dialog Flow. This talk demonstrates the capabilities of "Flower Genie," a teaching-oriented chatbot that can recommend a bouquet for any occasion, then take an order and deliver the flower arrangement via Alexa. We present the overall architecture of a voice application developed with Dialog Flow, including dialog management and NLU, and then discuss the finer points of testing and publishing a voice application for Alexa.
"Challenges for AI in Healthcare" - Peter Graven Ph.DGrid Dynamics
Dynamic Talks Portland: The use of AI in many industries has revolutionized operations and efficiency. In healthcare, the progress is just beginning. Despite the promise of AI, why has the development lagged other industries? What issues are unique to healthcare that create challenges for common approaches? How can data scientists overcome these challenges and deliver on the promise of using data to reach multiple goals of improved quality, decreased cost, and greater patient satisfaction?
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Grid Dynamics
The global HIV pandemic continues, particularly in Sub-Saharan Africa. By 2025, 40 million people will be living with HIV. The global cost of the pandemic is in the hundreds of billions of dollars. Better treatment means more people are living longer and costs will increase. The use of existing and emerging technologies is rare. Research institutions don’t share data. Data that drive US HIV policy in 2019 are from 2017 because of the time it takes for the CDC and NIH to combine data. The are many opportunities for big data, ML and AI to have a broader and continued impact on the HIV crisis. The use of these technologies can identify new avenues of research and help prioritize and focus efforts. We are starting to see these technologies used more and more. Several case studies will be presented. For example, advances in HIV vaccine research by Dr. David Heckerman; research at UCLA and Georgetown University looking at how social media can be used for tracking and predicting the spread of the epidemic; and work by researchers to improve care utilization in South Carolina. The opportunities for commercial and non-profit ventures to apply existing and emerging technologies like big data, ML and AI are countless. Tech4HIV is an organization working to drive these efforts into the tech sector and provides opportunities and resources for engagement.
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...Grid Dynamics
Retail industry has gone through a significant transformation in the last decade; caused by the change in consumer behaviour and expectations which have been driven by the convenience and low cost offered by Amazon. There are clear signs of a similar disruption in the financial services industry now, driven by new entrants who are changing consumer expectations. In this talk, Rahul looks at the evolution of retailers and draws conclusions about the path players in the financial services have to take to compete and provide customer experiences that will be necessary to compete.
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...Grid Dynamics
Organizations need to tap into the huge potential of their vast volumes of data, but a use case tactical approach is not going to work. Instead, they need to work in the definition of a data strategy linked to the most relevant goals for the enterprise.
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...Grid Dynamics
Apache Spark is a general-purpose big data execution engine. You can work with different data sources with the same set of API in both batch and streaming mode. Such flexibility is great if you are experienced Spark developer solving a complicated data engineering problem, which might include ML or streaming. In Airbnb, 95% of all data pipelines are daily batch jobs, which read from Hive tables and write to Hive tables. For such jobs, you would like to trade some flexibility for more extensive functionality around writing to Hive or multiple days processing orchestration. Another advantage of reducing flexibility is creating "best practices", which can be followed by less experienced data engineers. In Airbnb, we've created a framework called "Sputnik," which tries to address these issues. In this talk, I'll show the typical boilerplate code, which Sputnik tries to reduce and concepts it introduces to simplify pipeline development.
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019Grid Dynamics
For businesses large and small, cloud-based software platforms are breaking down data siloes between teams, adding new capabilities, and boosting efficiency. While not known as early adopters, there's a compelling case for public safety to do the same. In this session, we'll explore how a comprehensive records and evidence platform can simplify operations and streamline data management and access, securely and responsibly. We'll also highlight the role of artificial intelligence, and how it can help agencies respond more quickly and improve outcomes.
Dynamic Talks: "Implementing data quality automation with open source stack" ...Grid Dynamics
The quality of business decisions, machine learning insights, and executive reports depend on the quality and integrity of the underlying data. There are many ways that data can get corrupted in an analytical data platform from de-synchronization with the system-of-record to defects in data pipelines. We will show how to detect and prevent data corruption with automation, open source tools, and machine learning.
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...Grid Dynamics
Dynamic talks Seattle: Artificial Intelligence (AI) and data are foundational to ideation of new business models that bring growth and efficiencies. This is in action in pharmaceutical supply chain for reducing costs, increasing volumes, and optimizing the contracts with suppliers. The implementation involves process and tools that make data usable, and overcome the challenges of culture, ethics, data scalability, compute and engineering, and. Learn about data collection, data management, and metadata management tools implementation and modern data architecture to support them. Discuss machine learning algorithms for growth and efficiency scenarios.
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...Grid Dynamics
Dynamic Talks Seattle: Patient No-shows and Late Cancelations is estimated to cost around $150B across the nation. Providence St. Joseph Health, Healthcare Intelligence Group has been working on this pain point for the past couple of years and has developed a software solution leveraging a predictive engine to identify high risk patients at risk of No-show and Late Cancellation on a daily basis. More than 25 clinics have been using this solution for about 2 years now to optimize their reminder call strategy and to have a meaningful reduction of No-shows and Late Cancellations. In addition, a call center pilot was designed recently to standardize this process end-end. Pilot results will be evaluated to prove the impact at scale.
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...Grid Dynamics
In this talk, we will discuss automatic decision-making and AI techniques for customer relationship management. First, we will present a methodology that helps to develop highly automated promotion and loyalty management systems. Next, we will walk through practical examples of how advanced customer and content signals can be generated using predictive models, and how optimization and reinforcement learning techniques can be used for targeting, budgeting, and pricing decisions. This talk is for Data Scientists, Product Owners, and Software Engineers involved in marketing operations or development of marketing automation software and interested in ML-based decision automation techniques.
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...Grid Dynamics
Dynamic talks SF: So you have heard all the hype around how Machine Learning is going to change the world. But within your business context, where do you start? How do you get leadership buy-in for investment? And how and when you start scaling your ML Services?
In this session, you will walk away with an actionable framework to bootstrap and scale a machine learning services team. You will see the framework in action through an actual 0 to 1 product journey involving deep learning where we delivered value in record speed in-spite of not having a dataset when we started. You will get practical tips on how to make decisions about when and how to scale your capability to scale ML Services and platform. Some of the tips are pretty counterintuitive and revealed themselves with our experience of productizing ML services over the last 5+ years. (Using a diverse range of technologies - Vision, Language, Graph, Anomaly Detection, Search Relevance, Personalization)
Realtime Contextual Product Recommendations…that scale and generate revenue -...Grid Dynamics
Dynamic Talks SF: Recommendation systems are all around us. E-commerce companies like Amazon recommend products we are likely to buy based on our browsing behavior. Netflix suggests what shows we should watch based on our binging habits. Spotify builds a personalized playlist we would enjoy listening to, based on their understanding of what musical genre we are into.
In this talk, we will explore recent advances in the area of product recommendations in both research and practice. We will see how machine learning, design thinking and solid data engineering principles are combined to create an engaging customer experience that positively impacts the bottom line.
We will look at how we use various deep learning architectures to obtain image and text embeddings that supplement user and product based features to generate product recommendations that align closely with a consumer’s aesthetic preferences.
The talk would be of interest to data scientists, data engineers, product managers, UX designers and anyone interested in machine learning.
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...Grid Dynamics
In this talk, we will discuss automatic decision-making and AI techniques for customer relationship management. First, we will present a methodology that helps to develop highly automated promotion and loyalty management systems. Next, we will walk through practical examples of how predictive models can be used to characterize customer intent, and how optimization and reinforcement learning techniques can be used to build next best action models that incorporate targeting, budgeting, and pricing decisions.
Best practices for enterprise-grade microservices implementations with Google...Grid Dynamics
When migrating to a cloud and microservices architecture, companies need to invest in foundational capabilities, such as a microservices platform, continuous delivery, and an immutable infrastructure. In this talk, we will discuss our experience implementing these capabilities on the enterprise-scale with Google Cloud, Kubernetes, Istio, Envoy, Spinnaker, and Hashicorp stack. We will also discuss best practices of onboarding the cloud to facilitate DevOps, SRE without sacrificing quality or control.
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...Grid Dynamics
Abstract: In this talk, Scott will go over the basics of Attribution Modelling, its advantages and disadvantages, and strategic ways to implement in the cloud leveraging big data.
Building an algorithmic price management system using ML: Dynamic talks Seatt...Grid Dynamics
In this talk, we will discuss how predictive modeling and reinforcement learning can be used to build advanced price management systems that unlock the potential of dynamic and personalized pricing. We will present price optimization methods for a number of use cases including introductory pricing, promotion calendars, replenishable and seasonal products, targeted offers, and flash sales. We will also review case studies that demonstrate how these methods were applied in practice and how algorithmic price management components were fitted into pricing strategies.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
2. PRIVILEGED AND CONFIDENTIAL
• Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation
commerce technology solutions
• Record of outperformance with Tier 1 retail clients
• Fortune 1000 client relationships
3. About Me
● principal engineer at Grid Dynamics
● spoke at few last LuceneRevolutions
● contributed BlockJoin query parser for Solr - {!parent}
● blogged about it at http://blog.griddynamics.com/
● tried to fix threads at DataImportHandler
http://google.com/+MikhailKhludnev
4. You are expected to know
● how Lucene searches/filters
● how it counts facets
● that there are segments
● what is DocValues
● why to join
● RDBMS joins: nested loop join, sort-merge join and hash join.
5. I’m expected to know
● query-time join
● index-time join
● yet another one join
47. Benchmarking 2.9 M docs
https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark-5-1
Latency, ms
the bigger the worse
7
28
14
BlockJoin
(i-time)
Join Index
JoinUtil
(q-time)
GloblOrdinals
50. Further Plans 2.0
● incremental join-index update
● perhaps just calculate and cache it
● or put to dedicated index
● join in both directions
● calculate optimal execution plan of segments enumeration
● edge case for benchmark
51. Summary
● Joins in General
● JoinUtil vs Block-join vs GlobalOrdinals
● updatable DocValues
● opportunities for improving query-time joins:
○ eliminate term enum
○ choose lower cardinality side for enumeration
○ GlobalOrdinalsJoin
52. References
● Searching relational content with Lucene's BlockJoinQuery
http://blog.mikemccandless.com/
● Solr Experience: search parent-child relations. Part I
Solr block-join support
http://blog.griddynamics.com/
● https://wiki.apache.org/solr/Join
● http://www.slideshare.net/martijnvg/document-relations
● SOLR-6234 - {!scorejoin }
● LUCENE-6352
● Updatable DocValues Under the Hood
http://shaierera.blogspot.com/
● Subject: How to openIfChanged the most recent merge?
at: java-dev@lucene.apache.org
58. Two phase update problem
Subject: How to openIfChanged the most recent merge?
at: java-dev@lucene.apache.org
59. JoinUtil
● query-time
● indexing is fast
● searching is slow, why?
○ expensive term enum
○ single enumeration order
BlockJoin
● index-time
● reindexing whole block is as expensive as mandatory
● searching is darn fast, however
○ can’t reorder child docs
60. store ref = segment#, doc#
put ref to previous and current segment in DV
when add new segment, join IDs with previous segments
- for parent, just ref to all children docnums
- for children, add plain field refsToSeg:seg#
-
when score parents on some segment
- buffer them with the link refs, then
- intersect buffered link refs with children query on previous segments
- search all segments for refsToSeg:seg#, intersect with children query, obtain
perent ref from DV intersect with buffered