This talk explores the various non relational data stores that folks are using these days. We will disspell the myths and see what these things are actually useful for.
This presentation was given at the Barcelona on Rails meetup on October 27, 2019.
The live demo code can be found at https://github.com/JSFernandes/rails6-multi-db
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
This presentation was given at the Barcelona on Rails meetup on October 27, 2019.
The live demo code can be found at https://github.com/JSFernandes/rails6-multi-db
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions.
Cache solutions that can be used when developing applications have been examined. Redis, MemCache, JCache, and Hazelcast comparisons were made.
Performance, Security, Storage Capability and Eviction Policy, Maintenance, Reliability, Cost and also Who's using what.
One particular (and often forgotten) use-case for RavenDB is its usage as an embedded database. This operation mode allows application providers to abstract the complexity of database administration from their end-users while, at the same time, providing you a fully functional document store.
During this talk we will explore the challenges faced while deploying RavenDB in a massive number of machines throughout the globe (aiming at hundreds of thousands), and how RavenDB improved the capabilities of our application.
Oren Eini discusses the next major version of RavenDB 4.0, running on the CoreCLR, and skim over topics of performance (much higher), flexibility and ease of use.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1awkL99.
Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc. Filmed at qconsf.com.
Yash Nelapati is an infrastructure engineer at Pinterest where he focusses on scalability, capacity planning and architecture. Prior to Pinterest he was into web development and rapidly prototyping UI. Marty Weiner joined Pinterest in early 2011 as the 2nd engineer. Previously worked at Azul Systems as a VM engineer focused on building/improving the JIT compilers in HotSpot.
This is the slide deck which was used for a talk 'Change Data Capture using Kafka' at Kafka Meetup at Linkedin (Bangalore) held on 11th June 2016.
The talk describes the need for CDC and why it's a good use case for Kafka.
Know thy cost (or where performance problems lurk)Oren Eini
Performance happens. Whether you're designed for it or not it doesn’t matter, she is always invited to the party (and you better find her in a good mood). Knowing the cost of every operation, and how it distributes on every subsystem will ensure that when you are building that proof-of-concept (that always ends up in production) or designing the latest’s enterprise-grade application; you will know where those pesky performance bugs like to inhabit. In this session, we will go deep into the inner working of every performance sensitive subsystem. From the relative safety of the client to the binary world of Voron.
Making Startups Work: Scaling Drupal for Thrillist.comMichael Smith
My presentation from the "Making Startups Work" panel on October 4, 2011 in NYC. I presented about how we scaled Drupal to work for our traffic and our requirements.
For our eReader development project, we had to find a persistent storage for our JSON documents. After initial scanning we zeroed into two products DynamoDB and MongoDB. These slides take a deeper dive in the selection of our JSON data store.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions.
Cache solutions that can be used when developing applications have been examined. Redis, MemCache, JCache, and Hazelcast comparisons were made.
Performance, Security, Storage Capability and Eviction Policy, Maintenance, Reliability, Cost and also Who's using what.
One particular (and often forgotten) use-case for RavenDB is its usage as an embedded database. This operation mode allows application providers to abstract the complexity of database administration from their end-users while, at the same time, providing you a fully functional document store.
During this talk we will explore the challenges faced while deploying RavenDB in a massive number of machines throughout the globe (aiming at hundreds of thousands), and how RavenDB improved the capabilities of our application.
Oren Eini discusses the next major version of RavenDB 4.0, running on the CoreCLR, and skim over topics of performance (much higher), flexibility and ease of use.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1awkL99.
Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc. Filmed at qconsf.com.
Yash Nelapati is an infrastructure engineer at Pinterest where he focusses on scalability, capacity planning and architecture. Prior to Pinterest he was into web development and rapidly prototyping UI. Marty Weiner joined Pinterest in early 2011 as the 2nd engineer. Previously worked at Azul Systems as a VM engineer focused on building/improving the JIT compilers in HotSpot.
This is the slide deck which was used for a talk 'Change Data Capture using Kafka' at Kafka Meetup at Linkedin (Bangalore) held on 11th June 2016.
The talk describes the need for CDC and why it's a good use case for Kafka.
Know thy cost (or where performance problems lurk)Oren Eini
Performance happens. Whether you're designed for it or not it doesn’t matter, she is always invited to the party (and you better find her in a good mood). Knowing the cost of every operation, and how it distributes on every subsystem will ensure that when you are building that proof-of-concept (that always ends up in production) or designing the latest’s enterprise-grade application; you will know where those pesky performance bugs like to inhabit. In this session, we will go deep into the inner working of every performance sensitive subsystem. From the relative safety of the client to the binary world of Voron.
Making Startups Work: Scaling Drupal for Thrillist.comMichael Smith
My presentation from the "Making Startups Work" panel on October 4, 2011 in NYC. I presented about how we scaled Drupal to work for our traffic and our requirements.
For our eReader development project, we had to find a persistent storage for our JSON documents. After initial scanning we zeroed into two products DynamoDB and MongoDB. These slides take a deeper dive in the selection of our JSON data store.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
10. Find a small part of your
application that has pain
because of the sql database
and port just that part to
one of these systems
11. Find a small part of your
application that has pain
because of the sql database
and port just that part to
one of these systems
rinse ... repeat...
14. • Fast, in memory key/value store
• Async disk persistence
• STRING, LIST and SET data types
• Perfect Data Structure/State/Cache Server
http://code.google.com/p/redis/
http://github.com/ezmobius/redis-rb
15. Pros: Cons:
Very Fast(110k/ops/sec) Data Set must fit in Memory
High Level Data Types Possible to lose some data
Sequential IO Only between syncs(configurable)
Very Clean C Code
Scales horizontally via consistent hashing in the client
16. Use Redis when you want:
As fast as you can get
Data structure sharing between ruby processes
A better, persistent memcached
Hit counters, rate limiters, circular log buffers and sessions
17. • Large data workhorse
• Native, memcached and http interfaces
• Fast with full disk persistence
• Key/value with Lua extensions available
http://1978th.net/tokyocabinet/
http://1978th.net/tokyotyrant/
http://github.com/jmettraux/rufus-tokyo
http://copiousfreetime.rubyforge.org/tyrantmanager/
18. Pros: Cons:
Fast and Stable No data types for
Ability to store Large values(table db type
amounts of data excluded)
Embeded Lua in the Some issues with data sets
tyrant server larger then 70Gigs
Pluggable storage
engines
Scales horizontally via consistent hashing in the client
Tyrant also supports master/master and master /slave replication
19. Use Tokyo when you want:
As fast as you can get with synchronous writes
Store large amounts of persistent data
Use the smallest amount of disk space for your data set
Tunable RAM usage
20. • JSON document database
• the ‘mysql’ of key/value stores
• Fast but flexible query engine
• Support for sharding baked in(sorta)
• Replication
http://www.mongodb.org
http://github.com/mongodb/mongo-ruby-driver
http://github.com/jnunemaker/mongomapper
21. Pros:
Schemaless
Advanced query features Cons:
Relatively Fast AutoSharding not ready yet
GridFS No Transactions
Easy transition from SQL databases
Active development
Scales horizontally via auto sharding
Supports master/slave replication for failover
22. Use MongoDB when you want:
Very Fast with synchronous writes
Store large amounts of schemaless data
Great for logging, stats collection
Very powerful query API and indexing capabilities
23. • Document oriented database
• HTTP/JSON query interface
• Erlang map/reduce query interface
• Decentralized, just add or remove nodes to
scale
http://riak.basho.com/
24. Pros:
Cons:
Schemaless
Still young project
No master node/share
Not much documentation
nothing
Add/remove nodes
easily to scale up/down
Scales horizontally via shared nothing, hinted handoff and
consistent hashing. Definitely one to watch
25. Use Riak when you want:
Ease of operations when adding/removing nodes
26. • Eventually consistent, distributed,
structured key/value store
• Cross between Big Table and Dynamo
• Column Families provide higher level data
models then most key/value stores
• Truly add or remove nodes to scale
capacity
http://incubator.apache.org/cassandra/
http://blog.evanweaver.com/articles/2009/07/06/up-and-
running-with-cassandra/
27. Pros:
Always writable Cons:
Horizontally scalable Restart whole system when
Addnodes easily to scale making schema changes
write capacity Not much documentation
Easy to get common Rough edges abound
sorted queries(recent
blog posts etc)
Scales horizontally via automatic replication, even tunable
across racks/data centers
28. Use Cassandra when you want:
Truly just add nodes to scale out
Fairly rich data model, sorted supercolumn familes
Store truly large amounts of data
29. • Eventually consistent, distributed, key/value
store
• Based on Amazon’s Dynamo Papers
• Data Partitioning, versioning and read
repair
• Written in Erlang
http://github.com/cliffmoon/dynomite
30. Pros:
Always writable
Horizontally scalable Cons:
Good for storing large New partitions come online
binaries before they are *ready*
Add nodes to Migration is very painful
repartition Still beta quality but used in
Gossip protocol production at powerset
Pluggable storage
engines
Scales horizontally via automatic replication, talk to any
node in the cluster from clients
31. Use Dynomite when you want:
Scale writes by adding nodes
Store large binaries(like image assetts)
Nice web admin interface
33. Thats great but which
one should I use?
Unless you have good reasons
stick with Redis, Tokyo and
MongoDB for now
34. Thats great but which
one should I use?
Unless you have good reasons
stick with Redis, Tokyo and
MongoDB for now
But keep an eye on the others
for truly add node to scale out
capacity
35. Chef recipes to configure all of these on the Engine Yard
Cloud
http://github.com/ezmobius/ey-lessql
36. Pitfalls of #LESSQL
• No Referential Integrity
• Not as much tooling
• Almost non existent disaster recovery
tools
• Not as much production, used in anger
experience
37. Do not buy into the hype!
Most of you do not need this
stuff. Relational databases scale
well for 99% of use cases.
Don’t do it because it’s
“Cool”
38. But if you do your
homework, #LESSQL can be
very compelling and very
useful.
Please make informed
decisions so you don’t have to
hire me to clean up your mess!