Refactoring a Solr based API application

•

3 likes•396 views

Torsten Köster

Held at Apache Lucene Eurocon in Barcelona in October 2011

Technology Business

Architectural lessons learned from refactoring a
Solr based API application.

Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011

Contents
Shopping24 and it‘s API
Technical scaling solutions
Sharding
Caching
Solr Cores
„Elastic“ infrastructure
business requirements as key
factor

@tboeghk
Software- and systems- architect
2 years experience with Solr
3 years experience with Lucene

Team of 7 Java developers currently at Shopping24

index fact time

•16 Gig Data
•Single-Core-Layout
•Up to 17s response time
•Machine size limited
•Stalled at solr version 1.4
•API designed for small
tools

ask the nerds

„Shard!“
That‘ll be fun!

„Use spare compute cores at Amazon?“
breathe load into the cloud

„Reduce that index size“

„Get rid of those long running queries!“

... is highly effective.

500ms
1shard 2shard 3shard
4shard 6shard 8shard
375ms

250ms

125ms

1 4 8 12 16 20
concurrent requests

Sharding: size matters

the bigger your index gets,
the more complex your
queries are,
the more concurrent
requests,
the more sharding you need

fashion
product
lifecycle
meets SEO

Bastograﬁe / photocase.com

Separation of duties! Remove
unsearchable data from your
index.

... and build separate cores
for every client.

Duplicate data as long as
access is fast.

andybahn / photocase.com

Streamline your
index provisioning
process.

A thousand splendid cores
at your ﬁngertips.

Throwing hardware at
problems. Automated.

mirror your complete system
– solve load balancer problems

froodmat / photocase.com

What about those complex
queries? Why do we have
them? And how do we get
rid of them?

Lost in encapsulation:
Solr API exposed to world.

Questions? Comments? Ideas?
Twitter: @tboeghk
Github: @tboeghk
Email: torsten.koester@s24.com

Web: http://www.s24.com

Images: sxc.hu (unless noted otherwise)

Similar to Refactoring a Solr based API application

SnappyData Toronto Meetup Nov 2017

SnappyData

There's an elephant in the room when it comes to Big Data. Apache Hadoop and Spark offer the promise to transform how businesses leverage Big Data, finding the right mix of flexible deployments, elastic scalability, and performance can be daunting. Introducing Rackspace OnMetal™ for Apache Spark™ an industry first that combines the performance and efficiency of bare metal with the ease and flexibility of cloud. With Rackspace OnMetal for Cloud Big Data Platform you can transform how you run Hadoop and Spark workloads: •Deploy in minutes, not months •Spin instances up or down on demand •Process data in-memory for faster query times •Get bare metal performance and say goodbye to virtualization taxes Sign up and learn how Rackspace OnMetal for Cloud Big Data Platform can rapidly move your organization from planning to deploying.

Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform

Rackspace

Using Data Lakes

Amazon Web Services

Hybrid Transactional/Analytics Processing with Spark and IMDGs

Ali Hodroj

NoSQL databases like MongoDB, Elasticsearch, and Cassandra are synonymous with scalability, search, and developer agility. But there’s a downside...having to give up the ease and comfort of SQL. Or do you? Join this webcast to learn how the newest databases, like CrateDB and CockroachDB deliver the benefits of NoSQL with the ease of SQL by building SQL engines on top of custom NoSQL technology stacks. Database industry veteran Andy Ellicott, who helped launch Vertica, VoltDB, Cloudant, and now with Crate.io, will provide a no-BS view of current DBMS architectures and predictions for the future of data. If you’re a DBMS user, this webcast will help you make sense of a very crowded DBMS market and make better-informed decisions for your new tech stacks.

Webinar: The Future of SQL

Crate.io

SnappyData @ Seattle Spark Meetup

SnappyData

Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

Amazon Web Services

NoSql presentation

Mat Wall

Build the foundation for success with ScyllaDB Ready to try out ScyllaDB and want to make sure you’re “doing it right?” We’ll help you get up and running, fast. Spend an hour with our architects for a crash course in what ScyllaDB is all about, the core concepts you need to know, and a step-by-step demonstration of how to get started. During the live, interactive one-hour session, you will learn: - Critical considerations for designing a NoSQL system and NoSQL data model - The technology underlying ScyllaDB’s high performance, availability, and scalability – and best practices for taking advantage of it - How to install, deploy and operate a full working ScyllaDB system, including multi-data center deployment, monitoring, and connecting an application to the ScyllaDB cluster By the end of the session, you’ll have the knowledge and tools you need to get ScyllaDB running on your laptop, connect your application to it, and see what it’s like to use ScyllaDB for your specific use case.

ScyllaDB Virtual Workshop

ScyllaDB

AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...

Amazon Web Services

MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.

Agility and Scalability with MongoDB

MongoDB

Add Redis to Postgres to Make Your Microservices Go Boom!

Dave Nielsen

No SQL at The Guardian

Mat Wall

Data Analytics Week at the San Francisco Loft Using Data Lakes A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data. Speakers: John Mallory - Principal Business Development Manager Storage (Object), AWS Hemant Borole - Sr. Big Data Consultant, AWS

Using Data Lakes: Data Analytics Week SF

Amazon Web Services

Managing and distributing reference data globally has always been a challenge for financial institutions. Managing and maintaining database schemas while integrating and replicating that data across geographies is costly and time consuming. MongoDB's native replication capabilities and partitioned architecture make it simple to distribute and synchronize data efficiently across the globe. MongoDB’s dynamic schema dramatically reduces database maintenance for schema migrations – data structure changes can be applied with no down time, and with no impact to existing applications.

Webinar: How Banks Manage Reference Data with MongoDB

MongoDB

There are an ever increasing number of use cases, like online fraud detection, for which the response times of traditional batch processing are too slow. In order to be able to react to such events in close to real-time, you need to go beyond classical batch processing and utilize stream processing systems such as Apache Spark Streaming, Apache Flink, or Apache Storm. These systems, however, are not sufficient on their own. For an efficient and fault-tolerant setup, you also need a message queue and storage system. One common example for setting up a fast data pipeline is the SMACK stack. SMACK stands for Spark (Streaming) – the stream processing system Mesos – the cluster orchestrator Akka – the system for providing custom actors for reacting upon the analyses Cassandra – the storage system Kafka – the message queue Setting up this kind of pipeline in a scalable, efficient and fault-tolerant manner is not trivial. First, this workshop will discuss the different components in the SMACK stack. Then, participants will get hands-on experience in setting up and maintaining data pipelines.

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

Spark Summit

Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...

Julien SIMON

Thing you didn't know you could do in Spark

SnappyData

Speed and agility are the most expected in today’s analytics tools. The quicker you get from idea to insights, the more you can innovate & perform ad-hoc data analysis. I will be talking about how we can use AWS serverless architecture to stream IoT data, managed by python. We can be up and running in minutes―starting small, but able to easily grow to millions of devices and billions of messages.

Iot meets Serverless

Narendran R

Solr @ eBay Kleinanzeigen

Lucidworks (Archived)

Similar to Refactoring a Solr based API application (20)

SnappyData Toronto Meetup Nov 2017

Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform

Using Data Lakes

Hybrid Transactional/Analytics Processing with Spark and IMDGs

Webinar: The Future of SQL

SnappyData @ Seattle Spark Meetup

Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

NoSql presentation

ScyllaDB Virtual Workshop

AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...

Agility and Scalability with MongoDB

Add Redis to Postgres to Make Your Microservices Go Boom!

No SQL at The Guardian

Using Data Lakes: Data Analytics Week SF

Webinar: How Banks Manage Reference Data with MongoDB

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...

Thing you didn't know you could do in Spark

Iot meets Serverless

Solr @ eBay Kleinanzeigen

Recently uploaded

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Friends Colony Women Seeking Men

Delhi Call girls

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Slack Application Development 101 Slides

praypatel2

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

How to convert PDF to text with Nanonets

naman860154

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

CNv6 Instructor Chapter 6 Quality of Service

giselly40

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Delhi Call girls

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Slack Application Development 101 Slides

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

A Domino Admins Adventures (Engage 2024)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Exploring the Future Potential of AI-Enabled Smartphone Processors

How to convert PDF to text with Nanonets

Powerful Google developer tools for immediate impact! (2023-24 C)

What Are The Drone Anti-jamming Systems Technology?

Artificial Intelligence: Facts and Myths

Automating Google Workspace (GWS) & more with Apps Script

Tata AIG General Insurance Company - Insurer Innovation Award 2024

CNv6 Instructor Chapter 6 Quality of Service

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

How to Troubleshoot Apps for the Modern Connected Worker

Finology Group – Insurtech Innovation Award 2024

GenCyber Cyber Security Day Presentation

Refactoring a Solr based API application

1. Architectural lessons learned from refactoring a Solr based API application. Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011

2. Contents Shopping24 and it‘s API Technical scaling solutions Sharding Caching Solr Cores „Elastic“ infrastructure business requirements as key factor

3. @tboeghk Software- and systems- architect 2 years experience with Solr 3 years experience with Lucene Team of 7 Java developers currently at Shopping24

4. shopping24 internet group

5. 1 portal became n portals

6. 30 partner shops became 700

7. 500k to 7m documents

8. index fact time •16 Gig Data •Single-Core-Layout •Up to 17s response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools

9. scaling goal: 15-50m documents

10. ask the nerds „Shard!“ That‘ll be fun! „Use spare compute cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“

11. data sharding ...

12. ... is highly effective. 500ms 1shard 2shard 3shard 4shard 6shard 8shard 375ms 250ms 125ms 1 4 8 12 16 20 concurrent requests

13. Sharding: size matters the bigger your index gets, the more complex your queries are, the more concurrent requests, the more sharding you need

14. but wait ...

15. Why do we have such a big index?

16. 7m documents vs. 2m active poducts

17. fashion product lifecycle meets SEO Bastograﬁe / photocase.com

18. Separation of duties! Remove unsearchable data from your index.

19. Why do we have complex queries?

20. A Solr index designed for 1 portal

21. Grown into a multi-portal index

22. Let “sharding“ follow your data ...

23. ... and build separate cores for every client.

24. Duplicate data as long as access is fast. andybahn / photocase.com

25. Streamline your index provisioning process.

26. A thousand splendid cores at your ﬁngertips.

27. Throwing hardware at problems. Automated.

28. evil traps: latency, $$

29. mirror your complete system – solve load balancer problems froodmat / photocase.com

30. I said faster!

31. use a cache layer like Varnish.

32. What about those complex queries? Why do we have them? And how do we get rid of them?

33. Lost in encapsulation: Solr API exposed to world.

34. What‘s the key factor?

35. look at your business requirements

36. decrease complexity

37. Questions? Comments? Ideas? Twitter: @tboeghk Github: @tboeghk Email: torsten.koester@s24.com Web: http://www.s24.com Images: sxc.hu (unless noted otherwise)

Refactoring a Solr based API application

Recommended

Recommended

More Related Content

Similar to Refactoring a Solr based API application

Similar to Refactoring a Solr based API application (20)

Recently uploaded

Recently uploaded (20)

Refactoring a Solr based API application