SharePoint 2013 introduced search schema which allows for more granular permissions than the previous managed property model in SharePoint 2010. The search schema can be managed by site administrators, reducing the load on search administrators. The schema also enables more granular configuration for query, retrieval, refinement, sorting and other functions. Remote result sources can now be crawled locally and queried from other farms, improving geo-distributed search capabilities. Individual items can also be re-crawled more easily. Automatic URL balancing in crawl databases further improves scalability for large archive repositories. Upcoming changes to scalability limits will impact farm design for large archive content repositories.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...DataWorks Summit
There have been many voices discussing how to architect streaming
applications on Hadoop. Before now, there have been very few worked
examples existing within the open source. Apache Metron (Incubating) is a
streaming advanced analytics cybersecurity application which utilizes
the components within the Hadoop stack as its platform.
We will attempt to go beyond theoretical discussions of Kappa vs Lambda
architectures and describe the nuts and bolts of a streaming
architecture that enables advanced analytics in Hadoop. We will discuss
the componentry that we had to build and what we could utilize. We will
discuss why we made the architectural decisions that we made and how
they fit together to knit together a coherent application on top of many
different Hadoop ecosystem projects.
We will also discuss the domain specific language that we created out of
necessity to enable a pluggable layer to enable user defined enrichments.
We will discuss how this helped make Metron less rigid and easier to
use. We will also candidly discuss mistakes that we made early on.
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
Presentation on how to and access to source code for boosting and/or filtering documents by recency, popularity, and personal preferences. My solution improves upon the common "recip" based solution for boosting by document age.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...DataWorks Summit
There have been many voices discussing how to architect streaming
applications on Hadoop. Before now, there have been very few worked
examples existing within the open source. Apache Metron (Incubating) is a
streaming advanced analytics cybersecurity application which utilizes
the components within the Hadoop stack as its platform.
We will attempt to go beyond theoretical discussions of Kappa vs Lambda
architectures and describe the nuts and bolts of a streaming
architecture that enables advanced analytics in Hadoop. We will discuss
the componentry that we had to build and what we could utilize. We will
discuss why we made the architectural decisions that we made and how
they fit together to knit together a coherent application on top of many
different Hadoop ecosystem projects.
We will also discuss the domain specific language that we created out of
necessity to enable a pluggable layer to enable user defined enrichments.
We will discuss how this helped make Metron less rigid and easier to
use. We will also candidly discuss mistakes that we made early on.
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
Presentation on how to and access to source code for boosting and/or filtering documents by recency, popularity, and personal preferences. My solution improves upon the common "recip" based solution for boosting by document age.
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Integrate Solr with real-time stream processing applicationsthelabdude
Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.
Sumo Logic's first How-To webinar focused on optimizing our users' search experience. The webinar covers the following:
- Developing good search habits
- Setting the proper expectations around search performance
- The factors related to search speed
- Creating field extraction rules
- Defining a partitioning strategy
- Configuring scheduled views
With Search, developers and data engineers can run more relevant and responsive queries on the data in Hadoop and integrate with external tools to build custom real-time applications.
Python Awareness for Exploration and Production Students and ProfessionalsYohanes Nuwara
This was presented in a series of webinar organized by SPE Asia Pacific University and SPE Northern Emirates virtually in Malaysia. In this webinar, I presented a motivation presentation for students and young professionals on the education of programming for petroleum engineering and geoscience domains. I showcased some of my open-source works written in Python.
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Integrate Solr with real-time stream processing applicationsthelabdude
Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.
Sumo Logic's first How-To webinar focused on optimizing our users' search experience. The webinar covers the following:
- Developing good search habits
- Setting the proper expectations around search performance
- The factors related to search speed
- Creating field extraction rules
- Defining a partitioning strategy
- Configuring scheduled views
With Search, developers and data engineers can run more relevant and responsive queries on the data in Hadoop and integrate with external tools to build custom real-time applications.
Python Awareness for Exploration and Production Students and ProfessionalsYohanes Nuwara
This was presented in a series of webinar organized by SPE Asia Pacific University and SPE Northern Emirates virtually in Malaysia. In this webinar, I presented a motivation presentation for students and young professionals on the education of programming for petroleum engineering and geoscience domains. I showcased some of my open-source works written in Python.
10 Things I Like in SharePoint 2013 SearchSPC Adriatics
Speaker: Agnes Molnar;
Based on my SharePoint and FAST Search experience, I’ll demonstrate my “Research Path” on SharePoint 2013 Search. What’s new, what improvements we can find there as well as how to use our existing Search knowledge and experience in SharePoint 2013 Search.
You will learn:
Config options in SharePoint 2013 Search – Central Admin vs. PowerShell
Crawled and Managed Properties across Content Sources
Ranking and Relevancy
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenSPS Paris
One of the most compelling additions to a SharePoint practitioner’s toolbox is hybrid search. Although hybrid search capabilities were already around for a few years, with the introduction of the “Cloud Search Service Application” things got a lot more interesting. This demo-heavy session will focus on the technical implementation details and their prerequisites, as well as the typical hurdles that you’ll face in your first hybrid search project.
This presentation explains the details of all search components, how to properly configure your search topology, and your options to extend your search farm in a hybrid “cloud/on-prem” scenario. You will learn what you need to consider to design your search, in order to handle your organization's needs. We will dive into scripting a high availability search topology, keeping it healthy and manage your day-to-day search operations.
Learn about how to optimize your search for best performance and search relevancy, to support reliable search applications. Together, we will review where Search lives in the farm, the crawl components of search to implement a scalable farm.
SharePoint Search Topology and OptimizationMike Maadarani
This presentation covers the architecture of SharePoint Search Topology, how to extend search and how to optimize your search farm for better results. It describes how you can build your Search topology with PowerShell commands and it explains how you can use the Query Rules and Query Builder for a great search results.
SharePoint 2013 Search Topology and OptimizationMike Maadarani
In this presentation, I am explaining the details of all search components, how to properly configure the search topology, and the options to extend the search farm in a hybrid “cloud/on-premises” scenario. This presentation will explain what you need to consider to design your search, in order to handle your organization's needs. We will dive into scripting a high availability search topology, keeping it healthy and manage your day-to-day search operations.
Learn about how to optimize your search for best performance and search relevancy, to support reliable search applications.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
SharePoint 2013 Search Architecture with Russ Houberg
1.
2.
3.
4.
5. SharePoint 2010 SharePoint 2013
Managed Property (Multiple) Search Schemas
Best Bets Promoted Results (Query Rule)
Scope and Federated Location Result Source
Content By Query Content By Search
Incremental Crawl Continuous Crawl
MCM MCSM
6.
7.
8. Continous Crawl Benefits Continus Crawl Facts
• No more waiting for index • Runs every 15 minutes by
merge default
• Does not wait for other • Default interval can be
crawls to complete changed with PowerShell
• Can have multiple • Should be used instead of
continuous crawls running incremental crawls for
simultaneously SharePoint content sources
• Continuous crawls ignores
errors
9.
10.
11.
12.
13.
14. HTTP
Other
File Share End User Query
User Profile
Or
Content Process Initiated
SharePoint Sources Query
Content Query
Crawl Index
Processing Processing
Component Component
Component Component
Analytics
Processing Link Index
Crawl Partition(s)
Component Database
Database(s)
Event Store
Analytics
Database
15. What it Does Important Facts
• Crawls content sources to • We can have multiple crawl
populate index components
• Delivers crawl items (binary) and • MS Recommends: 2 Crawl
metadata to content processor Components per Search Service
• Invokes connectors or protocol Application
handlers to interact with content • MS Recommends: 8(4vm) CPU /
sources to retrieve data 8GB RAM per Crawl Component
• Uses one or more crawl
databases to store info about
crawl items and crawl history
16. What it Does Important Facts
• Processes crawl items and feeds to index • We must only have one (1) crawl
component processing component per server – more
• Transforms crawl items into artifacts that will hurt, not help crawl performance
can be included in search index • Max of 2 per search service application
(Performs document parsing and • Feeding Sessions are scaled based on
property mapping) CPU cores using a default coefficient of 3
• Writes information about links and urls 8 (cores) * 3 = 24 feeding sessions
in link database (which are analyzed by 4 (cores) * 3 = 12 feeding sessions
analytics to calculate relevance and • MS Recommends: 8(4vm) CPU / 8GB
currency - Results written back to search RAM per Content Processing Component
index by content processing component • Feeding sessions require RAM – More
• Generates phonetic name variations to RAM is necessary when more cores are
improve people search present – monitoring required
17. What it Does Important Facts
• Runs analytics jobs that analyze crawl items • Maximum of 6 per search service
and user interaction with search results to application
perform both search analytics and usage • Add more Analytics Processing Components
analytics to improve analytics performance
• Analyzes Link & Anchor text analysis, Clear • MS Recommends: 8(4vm) CPU / 8GB RAM /
distance, Search Clicks, Deep Links, Social 300GB disk space per Analytics Processing
Tags, Social Distance, Search Reports, Component.
Recommendations, Usage Counts, Activity • Interacts with Analytics Reporting to store
Ranking statistical information
• Improves search relevance and create • Interacts with Link database to store
search results information about searches and crawled
• Output included in search index by content documents
processor
18. What it Does Important Facts
• Receives processed items from content • Maximum of 60 index partitions (20
processing component and writes the index partitions X 3 index replicas) per
items to the index file search service application
• Receives queries from the query • Must provision one Index Component
processing component and returns for each index replica.
result sets • MS Recommends: 8(4vm) CPU / 16GB
• Redistributes content among index RAM / 500GB disk space per Index
partitions when index architecture is Component.
changed by Search Administration
Component
19. • Index partition is logical portion of entire search index (same
as before)
• Index partition is served by one or more index components
• Index components can be primary "replica" or secondary
Index "replica"
• Primary Replica is contacted by content processing
component to write new data in the index
Architecture • Secondary Replica is read only copy that get updated with
the data.
• Adding replicas improves query performance under load
• Add partitions to handle increased content corpus
• Can't remove partition after it has been added.
20. What it Does Important Facts
• Analyzes and processes queries and • Maximum of 1 per server
results • MS Recommends: 8(4vm) CPU / 8GB
• After receiving a query, it analyzes and RAM per Query Processing
processes the query to optimize Component.
precision, recall and relevance
• Submits processed queries to the index
component
• Processes the result set returned by
the index component before returning
to the querying entity.
24. Host 1 Host 2 Host 5 Host 6
Web server Web server Web server Web server
All SharePoint databases All SharePoint databases
Application Office Application Office Search admin db Link db
Server Web Apps Server Web Apps
Server Server Crawl db Analytics db Redundant copies of all databases using
SQL clustering, mirroring, or SQL Server
SharePoint Config db 2012 AlwaysOn
All other SharePoint databases
Host 3 Host 4
Application Server Application Server
Query Processing Query Processing
Replica Index part ition 0 Replica
Application Server Application Server
Crawl Crawl
Admin Admin
Analytics Analytics
Content processing Content processing
25. Host A Host B Host E Host F
Application Server Application Server
Query Processing
Replica Index part ition 0 Replica
Application Server Application Server
Analytics Analytics
Application Server Application Server Content processing Content processing
Application Server Application Server
Replica Index part ition 1 Replica
Admin Admin
Crawl Content processing Crawl Content processing
Host C Host D
Host G Host H
Application Server Application Server
Query Processing SharePoint databases SharePoint databases
Replica Index part ition 2 Replica Crawl db
Search admin db Crawl db
Redundant copies of all databases using
Application Server Application Server Link db Analytics db SQL clustering, mirroring, or SQL Server
2012 AlwaysOn
Replica Index part ition 3 Replica
26. Host A Host B Host C Host D Host K Host L Host M Host N
Application Server Application Server Application Server Application Server
Query Processing Query Processing
Replica Index part ition 2 Replica
Replica Index part ition 0 Replica
Application Server Application Server Application Server Application Server
Analytics Analytics Analytics Analytics
Application Server Application Server Application Server Application Server Content processing Content processing
Content processing Content processing
Application Server Application Server Application Server Application Server
Index part ition 1 Replica Index part ition 3 Replica
Replica Replica
Analytics Analytics
Crawl Admin Crawl Admin Content processing Content processing
Host E Host F Host G Host H
Host O Host P Host Q Host R
Application Server Application Server Application Server Application Server
SharePoint databases SharePoint databases SharePoint databases SharePoint databases
Query Processing Query Processing
Index part ition 4 Replica Replica Index part ition 6 Replica
Replica
Search admin db Link db
Redundant copies of all databases using Crawl db Redundant copies of all databases using
Analytics db SQL clustering, mirroring, or SQL Server
Application Server Application Server Application Server Application Server SQL clustering, mirroring, or SQL Server
2012 AlwaysOn Crawl db 2012 AlwaysOn
Analytics db
Crawl db Crawl db
Replica Index part ition 5 Replica Replica Index part ition 7 Replica
Crawl db
Host I Host J
Application Server Application Server
Replica Index part ition 8 Replica
Application Server Application Server
Replica Index part ition 9 Replica
27.
28. Schema can be managed by site admins, reducing the load on search administrator
Schema can be configured to allow more granularity (query, retrieve, refine, sort, etc) - Affects
content index size
Remote result sources can be crawled locally and then queried by remote farms. Huge impact
on geo-distributed search… KL may be able to help!
Individual items can be re-crawled easily
Automatic URL balancing in crawl databases minimizes host name restrictions for large archive
repositories
Scalability limit changes will have a big impact on farm design for large archive content repositories in
the near future.