SlideShare a Scribd company logo
How ReversingLabs
Serves File Reputation
Service for 20B Files
Goran Cvijanovic, Software Architect
Presenter
Goran Cvijanovic, Software Architect
Open Source Databases integration and optimization, Oracle &
Microsoft databases specialist, 20+ years of experience in
information technology with more than 15+ years in database
systems integration, migration and tuning.
Services
Overview
File Reputation
TitaniumCloud
Reputation Services
Powerful threat intelligence
solutions with up-to-date, threat
classification and rich context on
over 20 billion goodware and
malware files
High Availability Database
It’s all about metadata and SLA
Pre-Built Threat Connectors & APIs
■ Provides 50+ APIs and feeds
■ Preempts emerging threats
by monitoring malware “in-
the-wild” using threat-
specific feeds including
Ransomware, APT, CVE,
financial, and retail
information sources
■ Supports advanced search
and targeted queries on large
sample datasets
In-House
Database
In-House DB distributed key-value store
To be able to insert/update and read large amount of data with low
latency and stable response time, we developed our own distributed key-
value store based on LSM Tree architecture.
■ 3B reads / 1B writes per day
■ Latency in DB system < 2ms
● write < 2ms
● read < 1ms
■ Latency for API < 60ms
Requirements for Database Independency
To be database independent, we developed database connection library
which can be transparently connected to different databases
Features we needed from Database System:
■ K-V native protobuf format
■ LZ4 Compression for reduced storage size
■ Latency < 2 ms
■ Support record size 1K to 500M
■ High Availability with replication
ScyllaDB
Scylla setup
Scylla DB v3 with mc storage format tested with different configs
After POC we successfully implemented Scylla cluster with 8 nodes for
our 2 large volume API-s.
■ Concerns
● Binary K-V structure (sharding, how to query)
● Chunk size (compression, storage size)
● Repairs (take long time to finish)
Scylla implementation
All concerns successfully resolved
Parameters are tweaked and tested on cluster with production load
■ Concerns resolved
● Binary K-V structure (blobs working fine)
● Chunk size (achieved improvement in compression 49%)
■ 4K for small records
■ 64K for large records
● Repairs (eliminate repairs)
■ Insert/Update with consistency level all
■ Read with consistency level quorum
■ Delete with consistency level all
File Reputation
Statistics
Serving threat classification and
rich context on over 20 billion
goodware and malware files from
ScyllaDB cluster
Graph displays
number of requests handled
by hour
req
hours
File Reputation
API latency
Performance test on API
includes complete workflow
300 req/sec 32 workers in parallel
■ User request
■ User auth
■ Request validation
■ Query database
■ Format response
■ Sending response
Test duration: 0:05:20
Samples count: 81645, 0.00 % failures
Average times: total 0.120, latency 0.120
Percentiles:
┌───────────────┬───────────────┐
│ Percentile, % │ Resp. Time, s │
├───────────────┼───────────────┤
│ 0.0 │ 0.080 │
│ 50.0 │ 0.118 │
│ 90.0 │ 0.135 │
│ 95.0 │ 0.142 │
│ 99.0 │ 0.166 │
│ 99.9 │ 0.232 │
│ 100.0 │ 0.331 │
└───────────────┴───────────────┘
File Reputation
ScyllaDB latency
Average Latency
■ write < 6ms
■ read < 1 ms
99 Percentile Latency
■ write < 12 ms
■ read < 7 ms
Lessons Learned
■ How to save some precious NVMe disk space
● Know your data and test compression with different chunk_size
● Can save half of storage used, in our case 49% less
■ Repairs can impact cluster performance
● Insert/Update with consistency level quorum
● Delete with consistency level all
■ Latency on steroids
● Use NVMe disks in RAID-0 configuration
Stay in touch
Goran Cvijanovic
goran.cvijanovic@
reversinglabs.com
@gorancv

More Related Content

What's hot

Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Databricks
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your database
ScyllaDB
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDBScylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
ScyllaDB
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
ScyllaDB
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
DataStax
 
Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...
Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...
Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...
ScyllaDB
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
ScyllaDB
 
Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack Trove
Tesora
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
ScyllaDB
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to Database
DataStax Academy
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
Augury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoTAugury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoT
ScyllaDB
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
ScyllaDB
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
Vincent GALOPIN
 

What's hot (20)

Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your database
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDBScylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...
Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...
Scylla Summit 2018: Joining Billions of Rows in Seconds with One Database Ins...
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack Trove
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to Database
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Augury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoTAugury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoT
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 

Similar to How ReversingLabs Serves File Reputation Service for 10B Files

Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
TechWell
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing Keynote
Neo4j
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
HBaseCon
 
Amazon RDS: Deep dive with Oracle
Amazon RDS: Deep dive with OracleAmazon RDS: Deep dive with Oracle
Amazon RDS: Deep dive with Oracle
Amazon Web Services
 
Oracle Data Guard for Beginners
Oracle Data Guard for BeginnersOracle Data Guard for Beginners
Oracle Data Guard for Beginners
Pini Dibask
 
IOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for BeginnersIOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for Beginners
Pini Dibask
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
Kyle Hailey
 
Don't Fumble the Data! Integrate Database Automation into your DevOps Toolchain
Don't Fumble the Data! Integrate Database Automation into your DevOps ToolchainDon't Fumble the Data! Integrate Database Automation into your DevOps Toolchain
Don't Fumble the Data! Integrate Database Automation into your DevOps Toolchain
DevOps.com
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginners
Pini Dibask
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
Bob Ward
 
Perfsystems- Consulting Services
Perfsystems- Consulting ServicesPerfsystems- Consulting Services
Perfsystems- Consulting Services
Perfsys Tems
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Amazon Web Services
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Andreas Grabner
 
Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...
Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...
Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...
avanttic Consultoría Tecnológica
 
Services Over Servers - Innovate VA 2016
Services Over Servers - Innovate VA 2016Services Over Servers - Innovate VA 2016
Services Over Servers - Innovate VA 2016
SingleStonecx
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
Vijayendra Shamanna
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
Open Source Software – Open Day Oracle 2013
Open Source Software  – Open Day Oracle 2013Open Source Software  – Open Day Oracle 2013
Open Source Software – Open Day Oracle 2013
Erik Gur
 
Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...
Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...
Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...
Particular Software
 

Similar to How ReversingLabs Serves File Reputation Service for 10B Files (20)

Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing Keynote
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
Amazon RDS: Deep dive with Oracle
Amazon RDS: Deep dive with OracleAmazon RDS: Deep dive with Oracle
Amazon RDS: Deep dive with Oracle
 
Oracle Data Guard for Beginners
Oracle Data Guard for BeginnersOracle Data Guard for Beginners
Oracle Data Guard for Beginners
 
IOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for BeginnersIOUG Collaborate 18 - Data Guard for Beginners
IOUG Collaborate 18 - Data Guard for Beginners
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 
Don't Fumble the Data! Integrate Database Automation into your DevOps Toolchain
Don't Fumble the Data! Integrate Database Automation into your DevOps ToolchainDon't Fumble the Data! Integrate Database Automation into your DevOps Toolchain
Don't Fumble the Data! Integrate Database Automation into your DevOps Toolchain
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginners
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Perfsystems- Consulting Services
Perfsystems- Consulting ServicesPerfsystems- Consulting Services
Perfsystems- Consulting Services
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...
Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...
Meetup Oracle Database MAD_BCN: 1.3 Gestión del ciclo de vida de Oracle Datab...
 
Services Over Servers - Innovate VA 2016
Services Over Servers - Innovate VA 2016Services Over Servers - Innovate VA 2016
Services Over Servers - Innovate VA 2016
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Open Source Software – Open Day Oracle 2013
Open Source Software  – Open Day Oracle 2013Open Source Software  – Open Day Oracle 2013
Open Source Software – Open Day Oracle 2013
 
Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...
Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...
Building a Highly Scalable File Processing Platform with NServiceBus NSBCon b...
 

More from ScyllaDB

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 

More from ScyllaDB (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 

Recently uploaded

OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 

How ReversingLabs Serves File Reputation Service for 10B Files

  • 1. How ReversingLabs Serves File Reputation Service for 20B Files Goran Cvijanovic, Software Architect
  • 2. Presenter Goran Cvijanovic, Software Architect Open Source Databases integration and optimization, Oracle & Microsoft databases specialist, 20+ years of experience in information technology with more than 15+ years in database systems integration, migration and tuning.
  • 4. File Reputation TitaniumCloud Reputation Services Powerful threat intelligence solutions with up-to-date, threat classification and rich context on over 20 billion goodware and malware files High Availability Database It’s all about metadata and SLA
  • 5. Pre-Built Threat Connectors & APIs ■ Provides 50+ APIs and feeds ■ Preempts emerging threats by monitoring malware “in- the-wild” using threat- specific feeds including Ransomware, APT, CVE, financial, and retail information sources ■ Supports advanced search and targeted queries on large sample datasets
  • 7. In-House DB distributed key-value store To be able to insert/update and read large amount of data with low latency and stable response time, we developed our own distributed key- value store based on LSM Tree architecture. ■ 3B reads / 1B writes per day ■ Latency in DB system < 2ms ● write < 2ms ● read < 1ms ■ Latency for API < 60ms
  • 8. Requirements for Database Independency To be database independent, we developed database connection library which can be transparently connected to different databases Features we needed from Database System: ■ K-V native protobuf format ■ LZ4 Compression for reduced storage size ■ Latency < 2 ms ■ Support record size 1K to 500M ■ High Availability with replication
  • 10. Scylla setup Scylla DB v3 with mc storage format tested with different configs After POC we successfully implemented Scylla cluster with 8 nodes for our 2 large volume API-s. ■ Concerns ● Binary K-V structure (sharding, how to query) ● Chunk size (compression, storage size) ● Repairs (take long time to finish)
  • 11. Scylla implementation All concerns successfully resolved Parameters are tweaked and tested on cluster with production load ■ Concerns resolved ● Binary K-V structure (blobs working fine) ● Chunk size (achieved improvement in compression 49%) ■ 4K for small records ■ 64K for large records ● Repairs (eliminate repairs) ■ Insert/Update with consistency level all ■ Read with consistency level quorum ■ Delete with consistency level all
  • 12. File Reputation Statistics Serving threat classification and rich context on over 20 billion goodware and malware files from ScyllaDB cluster Graph displays number of requests handled by hour req hours
  • 13. File Reputation API latency Performance test on API includes complete workflow 300 req/sec 32 workers in parallel ■ User request ■ User auth ■ Request validation ■ Query database ■ Format response ■ Sending response Test duration: 0:05:20 Samples count: 81645, 0.00 % failures Average times: total 0.120, latency 0.120 Percentiles: ┌───────────────┬───────────────┐ │ Percentile, % │ Resp. Time, s │ ├───────────────┼───────────────┤ │ 0.0 │ 0.080 │ │ 50.0 │ 0.118 │ │ 90.0 │ 0.135 │ │ 95.0 │ 0.142 │ │ 99.0 │ 0.166 │ │ 99.9 │ 0.232 │ │ 100.0 │ 0.331 │ └───────────────┴───────────────┘
  • 14. File Reputation ScyllaDB latency Average Latency ■ write < 6ms ■ read < 1 ms 99 Percentile Latency ■ write < 12 ms ■ read < 7 ms
  • 15. Lessons Learned ■ How to save some precious NVMe disk space ● Know your data and test compression with different chunk_size ● Can save half of storage used, in our case 49% less ■ Repairs can impact cluster performance ● Insert/Update with consistency level quorum ● Delete with consistency level all ■ Latency on steroids ● Use NVMe disks in RAID-0 configuration
  • 16.
  • 17. Stay in touch Goran Cvijanovic goran.cvijanovic@ reversinglabs.com @gorancv