Ronil Merchant and Apoorva Gaurav presented on how Bounce, a mobility startup providing dockless scooter sharing, implemented real-time geo-searching at scale using RediSearch. They faced challenges with high throughput location updates on PostgreSQL and evaluated Elasticsearch and RediSearch. RediSearch indexed documents by geohash for location and performed better than Elasticsearch or scaling PostgreSQL. Testing showed tagging fields and omitting latitude/longitude indexing improved performance. They now use AWS DMS and Kinesis to sync data from PostgreSQL to RediSearch for low latency bike discovery.
This presentation aims to cover Apache Spark Performance and Tuning Takeaways by focusing Data Structures, Persistency, Partitioning, Event Sourcing on Transformations and Checkpointing.
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
Two presentation from the Michigan Information Retrieval Enthusiasts Group Meetup on August 19 by Cengage Learning search platform development team.
Scaling Performance Tuning With Lucene by John Nader discusses primary performance hot spots related to scaling to a multi-million document collection. This includes the team's experiences with memory consumption, GC tuning, query expansion, and filter performance. Discusses both the tools used to identify issues and the techniques used to address them.
Relevance Tuning Using TREC Dataset by Rohit Laungani and Ivan Provalov describes the TREC dataset used by the team to improve the relevance of the Lucene-based search platform. Goes over IBM paper and describe the approaches tried: Lexical Affinities, Stemming, Pivot Length Normalization, Sweet Spot Similarity, Term Frequency Average Normalization. Talks about Pseudo Relevance Feedback.
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
Presented by Pavel Hardak and Eren Avsarogullari (ApacheCon 2020)
https://www.linkedin.com/in/pavelhardak/
https://www.linkedin.com/in/erenavsarogullari/
Title:
Apache Spark Development Lifecycle at Workday
Abstract:
Apache Spark is the backbone of Workday's Prism Analytics Platform, supporting various data processing use-cases such as Data Ingestion, Preparation(Cleaning, Transformation & Publishing) and Discovery. At Workday, we extend Spark OSS repo and build custom Spark releases covering our custom patches on the top of Spark OSS patches. Custom Spark release development introduces the challenges when supporting multiple Spark versions against to a single repo and dealing with large numbers of customers, each of which can execute their own long-running Spark Applications. When building the custom Spark releases and new Spark features, dedicated Benchmark pipeline is also important to catch performance regression by running the standard TPC-H & TPC-DS queries against to both Spark versions and monitoring Spark driver & executors' runtime behaviors before production. At deployment phase, we also follow progressive roll-out plan leveraged by Feature Toggles used to enable/disable the new Spark features at the runtime. As part of our development lifecycle, Feature Toggles help on various use cases such as selection of Spark compile-time and runtime versions, running test pipelines against to both Spark versions on the build pipeline and supporting progressive roll-out deployment when dealing with large numbers of customers and long-running Spark Applications. On the other hand, executed Spark queries' operation level runtime behaviors are important for debugging and troubleshooting. Incoming Spark release is going to introduce new SQL Rest API exposing executed queries' operation level runtime metrics and we transform them to queryable Hive tables in order to track operation level runtime behaviors per executed query. In the light of these, this session aims to cover Spark feature development lifecycle at Workday by covering custom Spark Upgrade model, Benchmark & Monitoring Pipeline and Spark Runtime Metrics Pipeline details through used patterns and technologies step by step.
This presentation aims to cover Apache Spark Performance and Tuning Takeaways by focusing Data Structures, Persistency, Partitioning, Event Sourcing on Transformations and Checkpointing.
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
Two presentation from the Michigan Information Retrieval Enthusiasts Group Meetup on August 19 by Cengage Learning search platform development team.
Scaling Performance Tuning With Lucene by John Nader discusses primary performance hot spots related to scaling to a multi-million document collection. This includes the team's experiences with memory consumption, GC tuning, query expansion, and filter performance. Discusses both the tools used to identify issues and the techniques used to address them.
Relevance Tuning Using TREC Dataset by Rohit Laungani and Ivan Provalov describes the TREC dataset used by the team to improve the relevance of the Lucene-based search platform. Goes over IBM paper and describe the approaches tried: Lexical Affinities, Stemming, Pivot Length Normalization, Sweet Spot Similarity, Term Frequency Average Normalization. Talks about Pseudo Relevance Feedback.
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
Presented by Pavel Hardak and Eren Avsarogullari (ApacheCon 2020)
https://www.linkedin.com/in/pavelhardak/
https://www.linkedin.com/in/erenavsarogullari/
Title:
Apache Spark Development Lifecycle at Workday
Abstract:
Apache Spark is the backbone of Workday's Prism Analytics Platform, supporting various data processing use-cases such as Data Ingestion, Preparation(Cleaning, Transformation & Publishing) and Discovery. At Workday, we extend Spark OSS repo and build custom Spark releases covering our custom patches on the top of Spark OSS patches. Custom Spark release development introduces the challenges when supporting multiple Spark versions against to a single repo and dealing with large numbers of customers, each of which can execute their own long-running Spark Applications. When building the custom Spark releases and new Spark features, dedicated Benchmark pipeline is also important to catch performance regression by running the standard TPC-H & TPC-DS queries against to both Spark versions and monitoring Spark driver & executors' runtime behaviors before production. At deployment phase, we also follow progressive roll-out plan leveraged by Feature Toggles used to enable/disable the new Spark features at the runtime. As part of our development lifecycle, Feature Toggles help on various use cases such as selection of Spark compile-time and runtime versions, running test pipelines against to both Spark versions on the build pipeline and supporting progressive roll-out deployment when dealing with large numbers of customers and long-running Spark Applications. On the other hand, executed Spark queries' operation level runtime behaviors are important for debugging and troubleshooting. Incoming Spark release is going to introduce new SQL Rest API exposing executed queries' operation level runtime metrics and we transform them to queryable Hive tables in order to track operation level runtime behaviors per executed query. In the light of these, this session aims to cover Spark feature development lifecycle at Workday by covering custom Spark Upgrade model, Benchmark & Monitoring Pipeline and Spark Runtime Metrics Pipeline details through used patterns and technologies step by step.
OpenSource Big Data Platform - Flamingo ProjectBYOUNG GON KIM
Flamingo is a open-source Big Data Platform that combine a Ajax Rich Web Interface + Workflow Engine + Workflow Designer + MapReduce + Hive Editor + Pig Editor.
Movies : http://wiki.opencloudengine.org/pages/viewpage.action?pageId=2064714
Screen Shots : http://wiki.opencloudengine.org/pages/viewpage.action?pageId=2065069
Download : http://sourceforge.net/projects/hadoop-manager/files
Wiki : http://wiki.opencloudengine.org/pages/viewpage.action?pageId=819212
Hybrid Job-Driven Meta Data Scheduling for BigData with MapReduce Clusters an...dbpublications
It is cost-efficient for a tenant with a limited budget to establish a virtual Map Reduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant’s perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies Map Reduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms.
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
Performance evaluation and estimation model using regression method for hadoop word count.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...IJCSES Journal
Big data analysis has now become an integral part of many computational and statistical departments. Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data manipulation is now considered as a key area of research in the field of data analytics and novel
techniques are being evolved day by day. Thousands of transaction requests are being processed in every minute by different websites related to e-commerce, shopping carts and online banking. Here comes the need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. It can efficiently process the Netflow data collected from routers, switches or even from website access logs at
fixed intervals.
A comparative survey based on processing network traffic data using hadoop pi...ijcses
Big data analysis has now become an integral part of many computational and statistical departments.
Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data
manipulation is now considered as a key area of research in the field of data analytics and novel
techniques are being evolved day by day. Thousands of transaction requests are being processed in every
minute by different websites related to e-commerce, shopping carts and online banking. Here comes the
need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. It can
efficiently process the Netflow data collected from routers, switches or even from website access logs at
fixed intervals.
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
This effort is projected to give a high level summary of what is Big data and how to solve the issues generated through four V’s and stored in HDFS using various configuration parameters by setting up Hadoop, Pig and Hive to retrieve useful data from bulky data sets.
"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...Fwdays
Software systems are growing in size and complexity when the business is growing, and sometimes it is hard to figure out what is going on. Various teams make different changes for different business capabilities. Distributed Tracing is a useful way to look under the hood and see for yourself what operations are being performed, what services are used in a certain use case, and how performant are they. In this talk, I will present what Distributed Tracing is and how we introduced it into our software system with some tips and tricks on what you should focus on if you want to do the same.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This was presented by the MongoDB team at the Singapore VIP event on 24th Jan 2019.
The presentation covers-
What is MongoDB
Why MongoDB
MongoDB As a Service, Serverless Platform and Mobile
MongoDB Atlas: Database as a Service (Available on AWS, Azure and Google Cloud)
Usecases
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar.
In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR.
Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios.
Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects.
Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
OpenSource Big Data Platform - Flamingo ProjectBYOUNG GON KIM
Flamingo is a open-source Big Data Platform that combine a Ajax Rich Web Interface + Workflow Engine + Workflow Designer + MapReduce + Hive Editor + Pig Editor.
Movies : http://wiki.opencloudengine.org/pages/viewpage.action?pageId=2064714
Screen Shots : http://wiki.opencloudengine.org/pages/viewpage.action?pageId=2065069
Download : http://sourceforge.net/projects/hadoop-manager/files
Wiki : http://wiki.opencloudengine.org/pages/viewpage.action?pageId=819212
Hybrid Job-Driven Meta Data Scheduling for BigData with MapReduce Clusters an...dbpublications
It is cost-efficient for a tenant with a limited budget to establish a virtual Map Reduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant’s perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies Map Reduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms.
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
Performance evaluation and estimation model using regression method for hadoop word count.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pi...IJCSES Journal
Big data analysis has now become an integral part of many computational and statistical departments. Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data manipulation is now considered as a key area of research in the field of data analytics and novel
techniques are being evolved day by day. Thousands of transaction requests are being processed in every minute by different websites related to e-commerce, shopping carts and online banking. Here comes the need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. It can efficiently process the Netflow data collected from routers, switches or even from website access logs at
fixed intervals.
A comparative survey based on processing network traffic data using hadoop pi...ijcses
Big data analysis has now become an integral part of many computational and statistical departments.
Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data
manipulation is now considered as a key area of research in the field of data analytics and novel
techniques are being evolved day by day. Thousands of transaction requests are being processed in every
minute by different websites related to e-commerce, shopping carts and online banking. Here comes the
need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. It can
efficiently process the Netflow data collected from routers, switches or even from website access logs at
fixed intervals.
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
This effort is projected to give a high level summary of what is Big data and how to solve the issues generated through four V’s and stored in HDFS using various configuration parameters by setting up Hadoop, Pig and Hive to retrieve useful data from bulky data sets.
"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...Fwdays
Software systems are growing in size and complexity when the business is growing, and sometimes it is hard to figure out what is going on. Various teams make different changes for different business capabilities. Distributed Tracing is a useful way to look under the hood and see for yourself what operations are being performed, what services are used in a certain use case, and how performant are they. In this talk, I will present what Distributed Tracing is and how we introduced it into our software system with some tips and tricks on what you should focus on if you want to do the same.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This was presented by the MongoDB team at the Singapore VIP event on 24th Jan 2019.
The presentation covers-
What is MongoDB
Why MongoDB
MongoDB As a Service, Serverless Platform and Mobile
MongoDB Atlas: Database as a Service (Available on AWS, Azure and Google Cloud)
Usecases
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar.
In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR.
Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios.
Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects.
Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Similar to Real-time GeoSearching at Scale with RediSearch by Apoorva Gaurav and Ronil Merchant of Bounce - Redis Day Bangalore 2020 (20)
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
4. What is Bounce?
We are a Mobility startup providing dock-less scooter sharing solutions to consumers. It is
basically a service that enables you to pick up a scooter from anywhere and drop anywhere.
https://bounceshare.com/
6. PRESENTED
BY
1 Section 1
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.
2 Section 2
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.
3 Section 3
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.
Agenda:
7. PRESENTED
BY
Problem Statement
Bike listing lies at the core of the user experience at Bounce
and since it marks the beginning of the user booking flow.
To give our users the best experience we need to list bikes
which are nearest to them and best match the user preferences
Hence bike discovery needs to be highly accurate as well as
have minimal latency.
9. PRESENTED
BY
Challenges with current implementation.
Rapidly changing nature of our dataset (eg. We receive around 4k location updates per second) coupled
with the high throughput required at low latency ⚡ (We currently are at 1000 listings/second).
Difficult to handle this amount of scale on Postgres once we hit 10x scale (which has been the trend
over the last 10 months) as it will impact all db operations and a serious headache to Dev-ops and
platform team 😭😤!
Hence the use case essentially boiled down to having a db which performs extremely well in giving low
latency geo searching and document filtering capabilities in high write scenarios.
11. PRESENTED
BY
Elasticsearch
● Ran geo distance filter query, along with some other filters (term queries).
● Were able to achieve ~1300 ops/s with set and get operations on a single r4,4xlarge instance with CPU hovering
around 50%. close to 1 lakh keys.
● Load tests were performed on a single node r4.4xlarge. Achieved response times of ~14ms for both reads and
writes at 1200 requests per second. overall more than 80k docs with below structure
12. PRESENTED
BY
Why not go ahead with Postgres scaling?
It would have been a stop gap solution as eventually at certain scale again we would have faced the same problems.
We are already on a db.m4.10xlarge
Also wanted to move towards a document based structure for the same again.
13. PRESENTED
BY
Motivation behind choosing Redis
Redis is known to thrive with excellent read and write performance in huge loads.
But bike discovery needs more than mere key value fetching and a certain amount of document querying
capabilities as well.
Come Redisearch to the rescue 🚑!
14. PRESENTED
BY
What is Redisearch?
As the name suggests it is a redis powered search engine.
It has full text search, filtering and geo filtering capabilities.
https://oss.redislabs.com/redisearch/index.html
15. PRESENTED
BY
Approach taken
There were 2 options to go ahead with.
1. Use redis geo index
https://oss.redislabs.com/redisearch/Overview.html#geo_index
2. Use geo hashes and index them and use search index.
https://en.wikipedia.org/wiki/Geohash
We did some tests evaluations based on both the approaches and decided to go ahead with the 2nd approach as
it gave better performance for our use case.
The tests overview will be shared later in the presentation
16. PRESENTED
BY
What is Geo Index?
As per the documentation, Geo indexes utilize Redis' own geo-indexing capabilities. In query
time, the geographical part of the query (a radius filter) is sent to Redis, returning only the ids of
documents that are within that radius.
An example geo query looks like
17. What is a GeoHash? 🤔
Geohash is a public domain geocode system
which encodes a geographic location into a
short string of letters and digits. It is a
hierarchical spatial data structure which
subdivides space into buckets of grid shape
18. PRESENTED
BY
As geohash searches are basically text based searches they should be faster as compared to geo index which
requires computations for whether the given coordinates are in search query.
We ran some tests evaluations to back up the same.
The tests overview will be shared later in the presentation
19. PRESENTED
BY
● Basically a document with relevant bike data for bike discovery was indexed to redis and filtering was done
based on geohash and other additional filters.
● Geohashes and most of the other string based filters (Status, Availability, bike_type, etc) were indexed as the
Tag data-type.
● Reason being we didn’t need to leverage the full text searching capabilities of redis and hence did not index the
fields as FULLTEXT .
● Also Significant performance gains were observed on using TAGS against FULLTEXT due to a having a limited
key set.
● GeoHash also has querying advantages also can be used to aggregate based on location, for various use cases
like demand pricing, etc.
20. PRESENTED
BY
Tests run
Stress tests run, with several types of schemas used.
1. with GEO field for location, and other filters as FULLTEXT.
2. Indexing filters as FULLTEXT and running exact match query (Lat/Lon being indexed). Using geohash for
location representation.
3. Indexing filters as TAG (Lat/Lon being indexed).
4. Indexing filters as FULLTEXT and running wild card search query. (lat/lon as no index).
5. lat lon fields as NOINDEX and all fields to be filtered upon as TAG.
21. PRESENTED
BY
Results
Case 1: Geo query and using exact search for other filters.
Reads: 85/s Latency: 120 ms
Writes: 85/s Latency: 125 ms
Case 2: All Filter fields defined as FULLTEXT and using exact search
Reads: 320/s Latency: 30 ms
Writes: 380/s Latency: 26 ms
Case 3: Filter fields defined as tag field. Also indexing lat and lon fields.
Reads: 400/s Latency: 18.42 ms
Writes: 1.01K/s Latency: 19.19 ms
22. PRESENTED
BY
A notably high
response time
is seen,
attributed to a
key explosion in
inverted index
due to updates
with randomly
changing lat/lon
values for
documents.
23. PRESENTED
BY
Case 4: Filter fields defined as FULLTEXT and using wildcard search. Also lat/lon fields are marked as NOINDEX
reads: 1.4k/s Latency: 6.5 ms
writes: 1.75k/s Latency: 5.5 ms
24. PRESENTED
BY
Case 5: Filter fields defined as TAG field and lat/lon fields are marked as NOINDEX
For 1.4k reads/s and 4.5k writes/s
Read latency: 3.5ms 💪🙌
Write latency: 4ms 👏😍
25. As we can see significant
performance gains were observed
when the lat/lon fields were marked as
NOINDEX.
Reason can be attributed from the
following excerpt from the
documentation.
https://oss.redislabs.com/redisearch/
Overview.html#index_garbage_collecti
on
In a nutshell a reduced keyspace in the
inverted index resulted in a
commendable boost in performance.
26. PRESENTED
BY
● Currently using a AWS DMS + Kinesis
infrastructure to live sync data from
Postgres to Redis.
● Dms task does CDC (Change data
Capture) for certain tables and loads it to
kinesis. A Consumer application reads
from kinesis and updates the data to
RediSearch
● Eventually certain high frequency
updates will be completely moved to
redis.
Final Implementation Overview
28. PRESENTED
BY
Final Implementation Overview
One of the caveats of using a geo hash is that a user can be at the edge of the geohash grid. To
make it closer to a radial search behaviour, we also consider the neighbouring grids for our
search query.
Query looks something on the lines of as shown below
29. PRESENTED
BY
Conclusion
● Moving towards document based Redissearch structure allowed extremely fast querying
on multiple filters.
● Also will help us reduce quite a huge chunk of writes on our primary postgres database.
● Moving to a document based structure also helped us to move away from expensive
joins.
● Additional use cases like filtering based on fuel or condition parameters can be easily
implemented as shown below.
30. PRESENTED
BY
Few Drawbacks would be that sorting based on distance was not possible via this approach,
hence we had to implement the same at application level.