Cassandra in e-commerce

•

0 likes•4,412 views

This document discusses migrating an e-commerce platform's online product catalog from Oracle Coherence to Cassandra. The goals of the migration were to minimize system restart time, have at least two copies of data in different data centers, and enable quick simple backups. Performance testing showed Cassandra was able to meet the requirements of thousands of transactions per second and handle a full data reload daily with millions of products and entities stored. Configuring Cassandra optimization like disk layout and caching helped improve performance and meet the project's goals.

CASSANDRA
IN E-COMMERCE

Alexander Solovyev
solovyov.a.g@gmail.com

A PILOT PROJECT: ONLINE PRODUCT CATALOG
FOR A E-COMMERCE PLATFORM, MIGRATION TO
CASSANDRA
Previous version was based on In-Memory Data Grid Oracle
Coherence. All data from the primary storage (a relational database)
is cached in the data grid.
Goals of the migration:
•  minimization of time required for system restart
•  at least two copies of the data in different data-centers
•  quick and simple backup

ARCHITECTURE IN A NUTSHELL
•  Application server: all business logic + web-services
•  stateless
•  with local caches
•  Data storage
•  Oracle Coherence, then Cassandra via DataStax Java Driver
•  Batch data loading based on Spring Batch

HOW A PRODUCT SHOULD LOOK LIKE TO MEET
THE REQUIREMENTS?
Some hypotheses:
•  Data is on disk – available immediately after restart
•  OS disk cache brings all the data to memory
•  Key-value storage to simplify migration of the codebase
Nice to have:
•  Simple deployment configuration as a plus
•  Java-based solution as a plus

BASIC REQUIREMENTS / USE-CASES
•  reads: ~5K TPS
•  transactions can include more that one round-trip to the
storage, as well as more than one key in a query (“multi-gets”)
•  ~50K TPS on side of the storage
•  full data reload (once per 24 hours)
•  partial update of values (e.g. of product attributes)
•  availability 24x7
•  millions of products
•  tens of millions of related entities (product attributes etc.)

CANDIDATES
•  MongoDB
•  HBase
•  Oracle Coherence + data persistence (a la Riak)
•  Cassandra

PERFORMANCE TESTING ENVIRONMENT
•  Production-ready implementation
•  4 boxes (16 CPU, 24 GB) x 1 Cassandra instance
•  2 boxes x 2 app servers
•  100 GB of test data - fits in memory
•  Main test is read queries:
•  one hour
•  up to 500 users
•  even distribution of requested keys

WHAT DID HELP
• 

configure your Cassandra cluster
•  “OS swap off”
•  different physical disks for different file-sets - e.g. data vs. commit log
•  choose right (“private”) network interface

• 

async queries for multi-gets + token-aware rouring on the app server side:
+15% TPS and latency

• 

use last Cassandra version
•  a good example: 1.2.6 => 1.2.8 – 15% TPS, latency 2x better

WHAT DID HELP
• 

Use the key of a parent entity as a first component of the children keys:
PRIMARY KEY (parent-ID, child-ID)
•  to minimize number of queries / disk seeks
•  +15% TPS, latency 2x better

• 

use local (“near”) caches on app server side: +15% TPS
•  local EHCache

WHAT DID NOT HELP
• 

Java GC monitoring on Cassandra boxes
•  with recommended settings GC takes 7% maximum from overall time of
the tests

• 

caching == ALL
•  all data in OS disk cache

INTERESTING EXPERIMENTS
• 

another implementation of the token-aware query routing

• 

JSON or any other data format, if partial updates are not needed – a pure
key-value model
•  allows to avoid creation of tombstones in the case of updates, if values
contain Cassandra collections
•  another option is tuning of tombstone GC

SUMMARY
• 

Cassandra is stable and mature enough product

• 

Can compete with in-memory caches and data grids, at least if dataset is
small enough to be placed into memory

• 

Actively developing. Has a large community. Good commercial support from
DataStax

Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale. In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021

StreamNative

You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!

Introduction to RedisDvir Volk

Percona XtraBackup is a free, open source, complete online backup solution for all versions of Percona Server, MySQL® and MariaDB®. Percona XtraBackup provides: * Fast and reliable backups * Uninterrupted transaction processing during backups * Savings on disk space and network bandwidth with better compression * Automatic backup verification * Higher uptime due to faster restore time This talk will discuss the various different features of Percona XtraBackup, including: * Full & Incremental Backups * Compression, Streaming & Encryption of Backups * Backing Up To The Cloud (Swift). * Percona XtraDB Cluster / Galera Cluster. * Percona Server Specific features

The InnoDB Storage Engine for MySQLMorgan Tocker

Mux loves Clickhouse. By Adam Brown, Mux founder

Altinity Ltd

OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...

Altinity Ltd

OSA Con 2022: Using ClickHouse Database to Power Analytics and Customer Engagement Platform Prafulla Gupta - Times Internet This talk covers how we empowered Product Managers and Editors at Times Internet by developing an in-house product, GrowthRx, using Clickhouse Open Source Database to track and analyze user behavior to increase user retention and customer engagement. Times Internet is India's largest digital news publisher, which manages leading brands like Times of India, Economic Times, Navbharat Times, etc, where we are tracking more than 10 billion events per month in the ClickHouse Database.

Indexes in postgres

Louise Grandjonc

Data Science Across Data Sources with Apache Arrow

Databricks

Mongo DBEdureka!

PostgreSQL Extensions: A deeper look

Jignesh Shah

MongoDB at Scale

MongoDB

RocksDB detail

MIJIN AN

Your first ClickHouse data warehouse

Altinity Ltd

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

The Hive

This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.

Deep Dive on Amazon Aurora

Amazon Web Services

Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is disruptive technology in the database space, bringing a new architectural model and distributed systems techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share early customer experience from the field.

Transposition cipher

Antony Alex

ClickHouse Monitoring 101: What to monitor and how

Altinity Ltd

Webinar. Presented by Robert Hodges and Ned McClain, April 1, 2020 You are about to deploy ClickHouse into production. Congratulations! But what about monitoring? In this webinar we will introduce how to track the health of individual ClickHouse nodes as well as clusters. We'll describe available monitoring data, how to collect and store measurements, and graphical display using Grafana. We'll demo techniques and share sample Grafana dashboards that you can use for your own clusters.

Solving PostgreSQL wicked problems

Alexander Korotkov

腾讯大讲堂06 qq邮箱性能优化areyouok

Apache Cassandra at Macys

DataStax Academy

This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax. We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra. We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks. One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations. And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.

Apache Cassandra Data Modeling with Travis Price

DataStax Academy

What's hot

[Meetup] a successful migration from elastic search to clickhouse

Vianney FOUCAULT

Elk

Caleb Wang

Data pipelines observability: OpenLineage & Marquez

Julien Le Dem

Online MySQL Backups with Percona XtraBackup

Kenny Gryp

The InnoDB Storage Engine for MySQLMorgan Tocker

Mux loves Clickhouse. By Adam Brown, Mux founder

Altinity Ltd

OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...

Altinity Ltd

Indexes in postgres

Louise Grandjonc

Data Science Across Data Sources with Apache Arrow

Databricks

Mongo DBEdureka!

PostgreSQL Extensions: A deeper look

Jignesh Shah

MongoDB at Scale

MongoDB

RocksDB detail

MIJIN AN

Your first ClickHouse data warehouse

Altinity Ltd

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

The Hive

Deep Dive on Amazon Aurora

Amazon Web Services

Transposition cipher

Antony Alex

ClickHouse Monitoring 101: What to monitor and how

Altinity Ltd

Solving PostgreSQL wicked problems

Alexander Korotkov

腾讯大讲堂06 qq邮箱性能优化areyouok

What's hot (20)

[Meetup] a successful migration from elastic search to clickhouse

Elk

Data pipelines observability: OpenLineage & Marquez

Online MySQL Backups with Percona XtraBackup

The InnoDB Storage Engine for MySQL

Mux loves Clickhouse. By Adam Brown, Mux founder

OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...

Indexes in postgres

Data Science Across Data Sources with Apache Arrow

Mongo DB

PostgreSQL Extensions: A deeper look

MongoDB at Scale

RocksDB detail

Your first ClickHouse data warehouse

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

Deep Dive on Amazon Aurora

Transposition cipher

ClickHouse Monitoring 101: What to monitor and how

Solving PostgreSQL wicked problems

腾讯大讲堂06 qq邮箱性能优化

Viewers also liked

Apache Cassandra at Macys

DataStax Academy

Apache Cassandra Data Modeling with Travis Price

DataStax Academy

Cassandra and Riak at BestBuy.com

joelcrabb

Datastax Expedia

Eddie Satterly

Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...

DataStax

According to a recent Harvard Business Review study, there’s only a 43% chance that customers who have a poor experience will stick with you for the next 12 months. Contrast that to the 74% that will remain your customer if they have a great experience. Learn how Macy’s, a leading American department store chain founded in 1858 with over 750 stores in North America, is transforming their customer experience with DataStax Enterprise. Webinar recording: https://youtu.be/CiUVxh6Ov_E View current and past DataStax webinars: http://www.datastax.com/resources/webinars

Macy's: Changing Engines in Mid-Flight

DataStax Academy

This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax. One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations. This session will cover: 1) The process that led to our decision to use Cassandra 2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment 3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs. 4) Our lessons learned and next steps

Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014

dhiguero

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

DataStax

Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development. About the Speaker Aaron Ploetz Lead Technical Architect, Target Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.

Performance Monitoring: Understanding Your Scylla Cluster

ScyllaDB

The Upstream Game, 2hr version

Sean Roberts

The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...

DataStax

Why do data breaches occur even though we protect data at rest and in flight? What if the Cassandra admin's credentials get compromised? Why is it so hard to make encryption work for real world applications? If I encrypt my customer's data, do I have to turn it over when the authorities come calling? The answer may be in keeping data encrypted. . .always, let the customer own the keys and make data breaches irrelevant. In this talk, Ameesh Divatia, Co-founder at Baffle.io, will talk about a way to encrypt individual fields in a Cassandra database while continuing to let them be available for CQL access. From deterministic to random algorithms, key management and integration into DataStax drivers, this talk will introduce attendees to the steps to follow in order to protect an existing Cassandra database with field-level granularity ensuring protection against data breaches. About the Speaker Ameesh Divatia President & CEO, Baffle, Inc. Ameesh is a serial entrepreneur with over 25 years of operating experience in storage, security and networking infrastructure. He specializes in conceiving and implementing startup business plans that create new product categories by leveraging innovation in existing markets to its adjacencies. He co-founded Baffle in May 2015 to address the challenge of preventing data breaches in cloud infrastructure.

Web Store with Catalog and Product Management

Mike Taylor

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...

DataStax

We use Apache Cassandra at BlackRock to help power our Aladdin investment management platform. Like most users, we love Cassandra’s scalability and fault tolerance. One challenge we’ve faced is keeping data consistent between data centers. Cassandra is great at replicating data to multiple data centers, and many users take advantage of this feature to achieve eventual consistency in multi-region clusters. At BlackRock, we have several use cases where eventual consistency is not good enough; sometimes we need to guarantee that the most recent data is available from all locations. Cassandra’s tunable consistency makes it possible to achieve this extreme level of resiliency. In this talk we’ll discuss our experience from the past several years using Cassandra for cross-WAN consistency, some of the novel ways we’ve dealt with the performance implications, and our ideas for improving support for this usage model in future versions of Cassandra. About the Speaker Randy Fradin Vice President, BlackRock Randy Fradin is part of BlackRock’s Aladdin Product Group. His team is responsible for developing the core software infrastructure in BlackRock’s Aladdin platform, including scalable storage, compute, and messaging services. Previously he spent time developing the market data, risk reporting, and core trading functions in Aladdin. He has been an enthusiastic Cassandra user since 2011.

2016 August POWER Up Your Insights - IBM System Summit Mumbai

Anand Haridass

2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...

Anand Haridass

An unprecedented increase in the use of digital devices is causing an explosion in the amount of data generated & captured by businesses. The need to extract economic value from all this "Big Data", that has the potential to transform businesses completely, is immense and drives a whole slew of new workloads. Organizations need to continuously align strategy, business processes and infrastructure investments to derive these insights. This session will talk to how solutions based on POWER deliver this in a cost-effective, open, scalable, high performing and reliable manner.

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Patrick McFadin

Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.

Apache Cassandra in the Real World

Jeremy Hanna

Target: Escaping Disco-Era Data Modeling

DataStax Academy

Building high-performing Cassandra data models requires a query-based approach. However most of us were taught to build relational, normalized data models, which do not work well with Cassandra. Poor performing data models are often built with the idea of storing data efficiently, and then showered with secondary indexes to serve the required queries. Isn't it time that we learn how to build 21st century data models, without using 1970's techniques?

When to Use MongoDB...and When You Should Not...

MongoDB

Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...

DataStax

At Choice Hotels International, we are in the midst of a multi-year effort to replace our 25 year old monolithic reservation system with a cloud-based, microservice-style architecture using Cassandra. Since processing the first live reservation on the new system in December 2015, we've been shifting an increasing amount of shopping and booking traffic to the new system, with retirement of the old system scheduled for early 2017. After a quick review of our problem space, architecture, schema design, and Cassandra deployment, we'll take a closer look several challenges we faced and discuss how they impacted our data modeling, development and deployment: * Managing data with varying consistency requirements * Maintaining data integrity across microservice boundaries * Performing complex queries involving overlapping time ranges * Relying on time-to-live (TTL) for data cleanup * Balancing denormalization, performance and cost About the Speakers Andrew Baker Senior Software Engineer, Choice Hotels International Andrew is the technical lead of the service development team responsible for storage and maintenance of rates and reservations for thousands of hotels around the world. Jeffrey Carpenter Systems Architect, Choice Hotels International Jeff Carpenter is a software and systems architect with experience in the hospitality and defense industries, it. Jeff is currently working on a cloud-based hotel reservation system using Cassandra and is the author of the new O'Reilly book "Cassandra: The Definitive Guide, 2nd edition".

Viewers also liked (20)

Apache Cassandra at Macys

Apache Cassandra Data Modeling with Travis Price

Cassandra and Riak at BestBuy.com

Datastax Expedia

Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...

Macy's: Changing Engines in Mid-Flight

Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

Performance Monitoring: Understanding Your Scylla Cluster

The Upstream Game, 2hr version

The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...

Web Store with Catalog and Product Management

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...

2016 August POWER Up Your Insights - IBM System Summit Mumbai

2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...

A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise

Apache Cassandra in the Real World

Target: Escaping Disco-Era Data Modeling

When to Use MongoDB...and When You Should Not...

Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...

Similar to Cassandra in e-commerce

Oracle 12c New Features_RAC_slidesSaiful

Scylla Summit 2016: Compose on Containing the Database

ScyllaDB

VMworld 2013: Virtualizing Databases: Doing IT Right

VMworld

TechBeats #2

applausepoland

Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...

Data Con LA

Database as a Service on the Oracle Database Appliance Platform

Maris Elsins

Speaker: Marc Fielding, Co-speaker: Maris Elsins. Oracle Database Appliance provides a robust, highly-available, cost-effective, and surprisingly scalable platform for database as a service environment. By leveraging Oracle Enterprise Manager's self-service features, databases can be provisioned on a self-service basis to a cluster of Oracle Database Appliance machines. Discover how multiple ODA devices can be managed together to provide both high availability and incremental, cost-effective scalability. Hear real-world lessons learned from successful database consolidation implementations.

Running Oracle EBS in the cloud (DOAG TECH17 edition)

Andrejs Prokopjevs

This presentation is based on a real life experience migrating Oracle E-Business Suite production to AWS. We will talk about: - Certification basics. Overview on supported configurations. - How to build. Recommendations based on migration and 2 year production runtime experience. - Advanced configurations. - R12.2. - Microsoft Azure and Oracle Cloud review. Quick comparison outline of main alternative platforms. How ready is Oracle's own cloud service. - Scaling. This is a very client demanding topic. Many are looking into cloud migration options and how they can optimize the cost compared to the on-premise hosting, and many misunderstand the complexity of Oracle EBS stack being capable for cloud deployment.

NGENSTOR_ODA_P2V_V5UniFabric

The Design, Implementation and Open Source Way of Apache Pegasus

acelyc1112009

Owning time series with team apache Strata San Jose 2015

Patrick McFadin

Break out your laptops for this hands-on tutorial is geared around understanding the basics of how Apache Cassandra stores and access time series data. We’ll start with an overview of how Cassandra works and how that can be a perfect fit for time series. Then we will add in Apache Spark as a perfect analytics companion. There will be coding as a part of the hands on tutorial. The goal will be to take a example application and code through the different aspects of working with this unique data pattern. The final section will cover the building of an end-to-end data pipeline to ingest, process and store high speed, time series data.

Oracle GoldenGate Architecture PerformanceEnkitec

Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)

Bobby Curtis

OGG Architecture PerformanceEnkitec

Intro to Apache Kudu (short) - Big Data Application Meetup

Mike Percy

Introducing Apache Kudu (Incubating) - Montreal HUG May 2016

Mladen Kovacevic

Taking Splunk to the Next Level - Architecture Breakout Session

Splunk

Introduction to Apache Kudu

Jeff Holoman

Snowflake Datawarehouse Architecturing

Ishan Bhawantha Hewanayake

PostgreSQL High Availability in a Containerized World

Jignesh Shah

Azure Synapse Analytics Overview (r2)

James Serra

Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.

Similar to Cassandra in e-commerce (20)

Oracle 12c New Features_RAC_slides

Scylla Summit 2016: Compose on Containing the Database

VMworld 2013: Virtualizing Databases: Doing IT Right

TechBeats #2

Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...

Database as a Service on the Oracle Database Appliance Platform

Running Oracle EBS in the cloud (DOAG TECH17 edition)

NGENSTOR_ODA_P2V_V5

The Design, Implementation and Open Source Way of Apache Pegasus

Owning time series with team apache Strata San Jose 2015

Oracle GoldenGate Architecture Performance

Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)

OGG Architecture Performance

Intro to Apache Kudu (short) - Big Data Application Meetup

Introducing Apache Kudu (Incubating) - Montreal HUG May 2016

Taking Splunk to the Next Level - Architecture Breakout Session

Introduction to Apache Kudu

Snowflake Datawarehouse Architecturing

PostgreSQL High Availability in a Containerized World

Azure Synapse Analytics Overview (r2)

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

UiPath Test Automation using UiPath Test Suite series, part 3

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Neuro-symbolic is not enough, we need neuro-*semantic*

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

How world-class product teams are winning in the AI era by CEO and Founder, P...

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Accelerate your Kubernetes clusters with Varnish Caching

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Securing your Kubernetes cluster_ a step-by-step guide to success !

Elevating Tactical DDD Patterns Through Object Calisthenics

When stars align: studies in data quality, knowledge graphs, and machine lear...

Cassandra in e-commerce

1. CASSANDRA IN E-COMMERCE Alexander Solovyev solovyov.a.g@gmail.com

2. A PILOT PROJECT: ONLINE PRODUCT CATALOG FOR A E-COMMERCE PLATFORM, MIGRATION TO CASSANDRA Previous version was based on In-Memory Data Grid Oracle Coherence. All data from the primary storage (a relational database) is cached in the data grid. Goals of the migration: •  minimization of time required for system restart •  at least two copies of the data in different data-centers •  quick and simple backup

3. ARCHITECTURE IN A NUTSHELL •  Application server: all business logic + web-services •  stateless •  with local caches •  Data storage •  Oracle Coherence, then Cassandra via DataStax Java Driver •  Batch data loading based on Spring Batch

4. HOW A PRODUCT SHOULD LOOK LIKE TO MEET THE REQUIREMENTS? Some hypotheses: •  Data is on disk – available immediately after restart •  OS disk cache brings all the data to memory •  Key-value storage to simplify migration of the codebase Nice to have: •  Simple deployment configuration as a plus •  Java-based solution as a plus

5. BASIC REQUIREMENTS / USE-CASES •  reads: ~5K TPS •  transactions can include more that one round-trip to the storage, as well as more than one key in a query (“multi-gets”) •  ~50K TPS on side of the storage •  full data reload (once per 24 hours) •  partial update of values (e.g. of product attributes) •  availability 24x7 •  millions of products •  tens of millions of related entities (product attributes etc.)

6. CANDIDATES •  MongoDB •  HBase •  Oracle Coherence + data persistence (a la Riak) •  Cassandra

7. PERFORMANCE TESTING ENVIRONMENT •  Production-ready implementation •  4 boxes (16 CPU, 24 GB) x 1 Cassandra instance •  2 boxes x 2 app servers •  100 GB of test data - fits in memory •  Main test is read queries: •  one hour •  up to 500 users •  even distribution of requested keys

8. WHAT DID HELP •  configure your Cassandra cluster •  “OS swap off” •  different physical disks for different file-sets - e.g. data vs. commit log •  choose right (“private”) network interface •  async queries for multi-gets + token-aware rouring on the app server side: +15% TPS and latency •  use last Cassandra version •  a good example: 1.2.6 => 1.2.8 – 15% TPS, latency 2x better

9. WHAT DID HELP •  Use the key of a parent entity as a first component of the children keys: PRIMARY KEY (parent-ID, child-ID) •  to minimize number of queries / disk seeks •  +15% TPS, latency 2x better •  use local (“near”) caches on app server side: +15% TPS •  local EHCache

10. WHAT DID NOT HELP •  Java GC monitoring on Cassandra boxes •  with recommended settings GC takes 7% maximum from overall time of the tests •  caching == ALL •  all data in OS disk cache

11. INTERESTING EXPERIMENTS •  another implementation of the token-aware query routing •  JSON or any other data format, if partial updates are not needed – a pure key-value model •  allows to avoid creation of tombstones in the case of updates, if values contain Cassandra collections •  another option is tuning of tombstone GC

12. SUMMARY •  Cassandra is stable and mature enough product •  Can compete with in-memory caches and data grids, at least if dataset is small enough to be placed into memory •  Actively developing. Has a large community. Good commercial support from DataStax

13. THANK YOU …and your questions J

Cassandra in e-commerce

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Cassandra in e-commerce

Similar to Cassandra in e-commerce (20)

Recently uploaded

Recently uploaded (20)

Cassandra in e-commerce