Neo4j Bloom is a breakthrough graph communication and visualization product that allows graph novices and experts the ability to communicate and share their work, thoughts, and plans with peers, managers, and executives. Its illustrative, codeless search to storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.
This talk was given during Lucene Revolution 2017.
They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases?
In this presentation we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.
Neo4j Bloom is a breakthrough graph communication and visualization product that allows graph novices and experts the ability to communicate and share their work, thoughts, and plans with peers, managers, and executives. Its illustrative, codeless search to storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.
This talk was given during Lucene Revolution 2017.
They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases?
In this presentation we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.
The More Like This search functionality is a key feature in Apache Lucene that allows to find similar documents to an input one (text or document). Being widely used but rarely explored, this presentation will start introducing how the MLT works internally. The focus of the talk is to improve the general understanding of MLT and the way you could benefit from it. Building on the introduction the focus will be on the BM25 text similarity function and how this has been (tentatively) included in the MLT through a conspicious refactor and testing process, to improve the identification of the most interesting terms from the input that can drive the similarity search. The presentation will include real world usage examples, proposed patches, pending contributions and future developments such as improved query building through positional phrase queries.
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Vectorized Query Execution in Apache Spark at FacebookDatabricks
A standard query execution system processes one row at a time. Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering. In this talk, we will take a deep dive into Facebook's ORC-based vectorized reader and writer implementation, discuss how vectorization affects performance of various data types in Hive/Spark, and quantify the improvements vectorization brings to the Facebook Warehouse.
Speaker: Chen Yang
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
ELK Stack workshop covers real-world use cases and works with the participants to - implement them. This includes Elastic overview, Logstash configuration, creation of dashboards in Kibana, guidelines and tips on processing custom log formats, designing a system to scale, choosing hardware, and managing the lifecycle of your logs.
[2012 CodeEngn Conference 06] beist - Everyone has his or her own fuzzerGangSeok Lee
2012 CodeEngn Conference 06
퍼징 기술의 입문자, 중급자에 초점을 맞추었고 퍼징 기술에 대한 백그라운드 설명부터 시작하여 최근의 퍼징 기술에는 어떠한 것들이 있는지 살펴보고 퍼징 기술이 버그 찾기에 어떻게 쓰일 수 있는지 부터 시작하여 효율적인 퍼징 테크닉은 어떤것들이 있는지 설명한다. 또한, 각 퍼징 기법들의 한계점과 단점에 대해서 지적하며 왜 자신만의 퍼징 노하우와 방법이 필요한지 설명하면서 퍼징 기술의 복잡성과 다양성, 해커들은 어떤 기법을 선호하는지 소개한다. 발표자가 사용하고 있는 기법이나 노하우에 대해서도 공유 할 예정이다.
http://codeengn.com/conference/06
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."
Benchmarking is hard. Benchmarking databases, harder. Benchmarking databases that follow different approaches (relational vs document) is even harder.
But the market demands these kinds of benchmarks. Despite the different data models that MongoDB and PostgreSQL expose, many organizations face the challenge of picking either technology. And performance is arguably the main deciding factor.
Join this talk to discover the numbers! After $30K spent on public cloud and months of testing, there are many different scenarios to analyze. Benchmarks on three distinct categories have been performed: OLTP, OLAP and comparing MongoDB 4.0 transaction performance with PostgreSQL's.
What would be faster, MongoDB or PostgreSQL?
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeDatabricks
Data shuffling is a costly operation. At Facebook, single job shuffles can reach the scale of over 300TB compressed using (relatively cheap) large spinning disks. However, shuffle reads issue large amounts of inefficient, small, random I/O requests to disks and can be a large source of job latency as well as waste of reserved system resources. In order to boost shuffle performance and improve resource efficiency, we have developed Spark-optimized Shuffle (SOS). This shuffle technique effectively converts a large number of small shuffle read requests into fewer large, sequential I/O requests.
In this session, we present SOS’s multi-stage shuffle architecture and implementation. We will also share our production results and future optimizations.
Laravel has a lot of features and an extremely simple architecture. It sometimes leads the programmer to make some mistakes, when it comes time to get the most out of it. Through practical and simple examples we will enter the world of Laravel, starting with the basics and ending with the architecture that Laravel uses.
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Vectorized Query Execution in Apache Spark at FacebookDatabricks
A standard query execution system processes one row at a time. Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering. In this talk, we will take a deep dive into Facebook's ORC-based vectorized reader and writer implementation, discuss how vectorization affects performance of various data types in Hive/Spark, and quantify the improvements vectorization brings to the Facebook Warehouse.
Speaker: Chen Yang
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
ELK Stack workshop covers real-world use cases and works with the participants to - implement them. This includes Elastic overview, Logstash configuration, creation of dashboards in Kibana, guidelines and tips on processing custom log formats, designing a system to scale, choosing hardware, and managing the lifecycle of your logs.
[2012 CodeEngn Conference 06] beist - Everyone has his or her own fuzzerGangSeok Lee
2012 CodeEngn Conference 06
퍼징 기술의 입문자, 중급자에 초점을 맞추었고 퍼징 기술에 대한 백그라운드 설명부터 시작하여 최근의 퍼징 기술에는 어떠한 것들이 있는지 살펴보고 퍼징 기술이 버그 찾기에 어떻게 쓰일 수 있는지 부터 시작하여 효율적인 퍼징 테크닉은 어떤것들이 있는지 설명한다. 또한, 각 퍼징 기법들의 한계점과 단점에 대해서 지적하며 왜 자신만의 퍼징 노하우와 방법이 필요한지 설명하면서 퍼징 기술의 복잡성과 다양성, 해커들은 어떤 기법을 선호하는지 소개한다. 발표자가 사용하고 있는 기법이나 노하우에 대해서도 공유 할 예정이다.
http://codeengn.com/conference/06
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."
Benchmarking is hard. Benchmarking databases, harder. Benchmarking databases that follow different approaches (relational vs document) is even harder.
But the market demands these kinds of benchmarks. Despite the different data models that MongoDB and PostgreSQL expose, many organizations face the challenge of picking either technology. And performance is arguably the main deciding factor.
Join this talk to discover the numbers! After $30K spent on public cloud and months of testing, there are many different scenarios to analyze. Benchmarks on three distinct categories have been performed: OLTP, OLAP and comparing MongoDB 4.0 transaction performance with PostgreSQL's.
What would be faster, MongoDB or PostgreSQL?
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeDatabricks
Data shuffling is a costly operation. At Facebook, single job shuffles can reach the scale of over 300TB compressed using (relatively cheap) large spinning disks. However, shuffle reads issue large amounts of inefficient, small, random I/O requests to disks and can be a large source of job latency as well as waste of reserved system resources. In order to boost shuffle performance and improve resource efficiency, we have developed Spark-optimized Shuffle (SOS). This shuffle technique effectively converts a large number of small shuffle read requests into fewer large, sequential I/O requests.
In this session, we present SOS’s multi-stage shuffle architecture and implementation. We will also share our production results and future optimizations.
Laravel has a lot of features and an extremely simple architecture. It sometimes leads the programmer to make some mistakes, when it comes time to get the most out of it. Through practical and simple examples we will enter the world of Laravel, starting with the basics and ending with the architecture that Laravel uses.
Testing and TDD - Laravel and Express ExamplesDragos Strugar
A brief introduction to testing backend code. Examples given in two frameworks in PHP and JavaScript. Written in Serbian language and will be presented in Banja Luka Developers Meetup, March 4 2017.
Presentation of the paper Cataloguing for a Billion Word Library of Greek and Latin by Gregory Crane, Bridget Almas, Alison Babeu, Lisa Cerrato, Anna Krohn, Frederik Baumgardt, Monica Berti, Greta Franzini and Simona Stoyanova in DATeCH 2014. #digidays
Notes and power-point slides of a paper on the use and disuse of Library of Congress subject headings in fiction cataloguing at the National Library of Australia which was presented at the BSANZ 2010 Annual Conference, on censorship in literature.
Library Carpentry: software skills training for library professionals, Chart...James Baker
Notes for a keynote I gave at the Chartered Institute for Library and Information Professionals Cataloguing and Indexing Group biennial conference, University of Swansea, 31 August - 2 September 2016.
Notes at https://gist.github.com/drjwbaker/96a32b70da2e03035272b6e5656696ad
NADA originally developed to support the establishment of national survey data archives.NADA is a web-based cataloguing system that serves as a portal for researchers to browse, search, compare, apply for access, and download relevant census or survey information. It was originally developed to support the establishment of national survey data archives. Promotes equal access and broad use of microdata, to foster diversity and quality of research work The application is used by a diverse and growing number of national, regional, and international organizations. NADA, as with other IHSN tools, uses the Data Documentation Initiative (DDI), XML-based international metadata standard.
Taller de catalogación Linked Open Data y RDA: posibilidades y desafíos. Prim...DIGIBIS
Catalogación Linked Open Data y RDA: posibilidades y desafíos. Primera parte, de Xavier Agenjo Bullón, director de Proyectos de la Fundación Ignacio Larramendi, y Francisca Hernández Carrascal, consultora de DIGIBÍS.
Può lo sviluppo di REST API con PHP può diventare un'esperienza davvero gradevole?
Cos'è Laravel, la filosofia che il progetto porta avanti e come costruire API REST complete con uno dei framework più usati negli ultimi anni.
http://tinyurl.com/5rphu4
Alfresco offers document management using familiar interfaces to get rapid user adoption built on a repository that offers transparent, out-of-sight services for full ECM.
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
Site reliability in the serverless age - Serverless Boston MeetupErik Peterson
Just what is this serverless thing anyway and what does it mean for building reliable systems? To answer this, lets explore SRE & DevOps principals and map them to their serverless counterparts and along the way make a few predictions about our serverless future
Attendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solr features are utilized. and how Solr is configured and used in production. Recommended best practices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr.
Web Performance tuning presentation given at http://www.chippewavalleycodecamp.com/
Covers basic http flow, measuring performance, common changes to improve performance now, and several tools and techniques you can use now.
AWS Summit 2014 Melbourne - Breakout 5
Cloud computing gives you a number of advantages, such as being able to scale your application on demand. As a new business looking to use the cloud, you inevitably ask yourself, "Where do I start?" Join us in this session to understand best practices for scaling your resources from zero to millions of users. We will show you how to best combine different AWS services, make smarter decisions for architecting your application, and best practices for scaling your infrastructure in the cloud.
Presenter: Craig Dickson, Solutions Architect, Amazon Web Services
1. What is Solr?
2. When should I use Solr vs. Azure Search?
3. Why is Solr great (and its downside)?
4. How does Solr compare to Azure Search?
5. Why SearchStax? (Solr is complex; SearchStax makes it as easy as Azure Search)
Reuven Lerner's presentation from Open Ruby Day in Herzliya, Israel on June 27th, 2010. I covered a few tools that are not part of Rails, but which help you with deployment,
In this talk, I go over some of the concerns people initially have when adding GraphQL to their existing frontends and backends, and cover some of the tools that can be used to address them.
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr sponsored by O'Reilly Media!
Caserta Concepts shared one of their innovative DW projects using Solr. See how open source search technology can serve high performance analytic use cases. Presentation and solution walk-through given by Caserta Concepts' Joe Caserta and Elliott Cordo.
For more information, visit www.casertaconcepts.com
How to Build a Big Data Application: Serverless Editionecobold
Come learn how to build, launch, and scale a Big Data application in a serverless context. This is going to be an information packed meetup around Big Data processing, Lambda functions, Lambda Step functions, and everything that ties them together.
Big Data is something we're very passionate about. As the cost of servers have come down and the cost of software has become free, using data to drive your business has become much more obtainable to a larger group of companies. The serverless methodology has recently come in the scene, and it's proving to be just as transformational as cloud has been to the Big Data analytics space. We will be sharing some of our learnings and experiences over the last two years of working with Big Data in a serverless context. We will cover one or two examples of eventful Big Data processing, and the impact it can have on your business in terms of speed of analytics and cost savings to the bottom line.
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSameer Maggon
Akshay Sura, a leader in the Sitecore community and Sameer Maggon, a Solr guru, will take the audience through what it takes to design, and build Solr environments tuned for and worthy of a great Sitecore implementation. They will also share Devops considerations and best practices that are critical after Sitecore goes live. In addition to their experience-based comments, they will illustrate a number of these best practices with a live demo of SearchStax, a service that delivers Solr in PaaS and that Sitecore itself uses for its Managed Cloud environment.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Launch Your Streaming Platforms in MinutesRoshan Dwivedi
The claim of launching a streaming platform in minutes might be a bit of an exaggeration, but there are services that can significantly streamline the process. Here's a breakdown:
Pros of Speedy Streaming Platform Launch Services:
No coding required: These services often use drag-and-drop interfaces or pre-built templates, eliminating the need for programming knowledge.
Faster setup: Compared to building from scratch, these platforms can get you up and running much quicker.
All-in-one solutions: Many services offer features like content management systems (CMS), video players, and monetization tools, reducing the need for multiple integrations.
Things to Consider:
Limited customization: These platforms may offer less flexibility in design and functionality compared to custom-built solutions.
Scalability: As your audience grows, you might need to upgrade to a more robust platform or encounter limitations with the "quick launch" option.
Features: Carefully evaluate which features are included and if they meet your specific needs (e.g., live streaming, subscription options).
Examples of Services for Launching Streaming Platforms:
Muvi [muvi com]
Uscreen [usencreen tv]
Alternatives to Consider:
Existing Streaming platforms: Platforms like YouTube or Twitch might be suitable for basic streaming needs, though monetization options might be limited.
Custom Development: While more time-consuming, custom development offers the most control and flexibility for your platform.
Overall, launching a streaming platform in minutes might not be entirely realistic, but these services can significantly speed up the process compared to building from scratch. Carefully consider your needs and budget when choosing the best option for you.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
3. What I am about to cover
• The meaning of the term powerful
• MySQL Drawbacks
• Why SOLR
• Usage of tools
• Setting up / starting with SOLR Server
• Starting with SOLR
• Integrate SOLR with Laravel
• Laravel and SOLR Searching
4. What I do NOT cover
• Front-End implementation
• No Angular, VueJS or React integration (recommend VueJS though)
• Authentication
• Internals of SOLR / Elasticsearch
• Talk of Elze Kool in the summer
https://bitbucket.org/elzekool/talk-solr-elasticsearch-internals/src
5. What do I mean by “powerful search”?
• Fast
Search in milliseconds, not in seconds.
• Relevant
If you search for a Hotel in California, don’t return Hotels in
Groningen. Or a black Audi, you don’t want to see a white Peugeot
107.
• Scalable
If your data grows, then you need more resources. It could mean by
vertical scaling or (better) horizontal scaling of capacity.
6. MySQL Drawbacks for searching through text
• LIKE %%
• Fast? No, try searching for millions of records with multiple joins to combine
all the information needed.
• Relevant? No, Try search for “PHP programmeur” or “PHP developer”, or
“Audi A8 zwart” or “zwarte Audi A8“.
• Scalable? Could be, but with much hassle in configurations in production and
backing up.
• Counts – How many Audi’s? How many black cars? Takes multiple
query’s for the calculation of those counts.
NB: FULL TEXT not noted here, gave a bit better result but not great.
7. Why I started with SOLR
• Providers send vehicle updates daily between 06:00 and 22:00
more than 25.000 vehicles are changed throughout the day.
• Encountered MySQL limitations
• Speed – With JOINS for additional data.
• Table LOCK during write actions.
• Relevancy was not great.
• A temporary solution was creating a flat table, that contains all the
data to be shown at the searchoverview pages.
9. Thats why I started with SOLR
After changing the search from MySQL to SOLR
• Search time decreased from seconds to milliseconds.
• Relevancy went up since the search internals work differently.
• Costs went down, less resources needed.
• Organic traffic went up since the improved loading times.
• Visitors viewed more paged since better performance.
Statistics: 17.k daily visitors, 160k vehicles, 25k-30k changes
10. Next up: coverage of the following tools
• MySQL (for main storage)
• SOLR
• Docker (Optionaly)
• Laravel
12. Setting up / Starting MySQL on OSX
• Default credentials
User: root
Password: <empty>
Port: 3301
13. Setting up / starting SOLR on OSX
https://github.com/petericebear/solr-configsets
http://brew.sh
14. Setting up / starting SOLR with docker
https://github.com/petericebear/solr-configsets
15. Where to begin with SOLR - schema.xml
• Fields
1. Type
What kind of field is it?
2. Indexed
Do you want to search in it?
3. Stored
Return the data in results?
4. Multivalued
Can it contain more than 1 value?
• Fieldtypes
• String
• Integer
• Float
• Boolean
• Date
• Location (Latitude/Longitude)
• ..
16. Where to begin with SOLR - additional
Copy contents of a field to another field
Add analyzer to a fieldtype – remove stop words “de, het, een, op, in, naar” etc.
17.
18.
19.
20.
21. Installing laravel
• We use composer for the installation
• If you don’t have it, get it from here: https://getcomposer.org
Run the following in your terminal:
22. Communicating with SOLR
• https://github.com/solariumphp/solarium
• Over 1.2million downloads
Run the following in your terminal:
24. Add ServiceProvider
When the application needs the SolariumClient
class, it will inject the created configuration to
its Client. This way you will never have to touch
the code when something changes to the server
or core setting.
Notice the $defer setting. When true this
ServiceProvider only will autoload when the
Client is needed and NOT with evry request.
25. Activate ServiceProvider
To make use of the new ServiceProvider in your application you must
add it in config/app.php and add it to the list of providers.
34. Questions?
Twitter / Github: @petericebear
Email: psteenbergen@gmail.com
COUPONCODE: SOLRMEETING
50% OFF - FIRST MONTH
info@virtio.nl
Editor's Notes
Inleiding - Wat ga ik behandelen in deze presentatie. Globale overview..
Inleiding - Wat ga ik behandelen in deze presentatie. Globale overview..
Dit zorgde voor een grote wachtrij – net als bij BlackFriday voor de winkels probeerden vele bezoekers op piek momenten door 1 deur te gaan. En dan krijg je het volgende effect.
Build a docker instance with additional config sets. PR’s welcome
docker-machine start
eval $(docker-machine env)