The document summarizes a benchmarking study conducted by Altoros Systems to compare the performance of Couchbase Server, MongoDB, and Cassandra. It outlines the benchmark goals of having a reproducible workload, using a realistic scenario, and comparing latency and throughput. It describes the benchmarking tools, scenario details involving data size, operations, and hardware configuration. Configuration details are provided for each database, including cluster specifications and parameter settings.
Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark's built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we'll take a deep dive into the Cosco architecture and describe how it's deployed at Facebook. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
Oracle Week 2017 slides.
Agenda:
Basics: How and What To Tune?
Using the Automatic Workload Repository (AWR)
Using AWR-Based Tools: ASH, ADDM
Real-Time Database Operation Monitoring (12c)
Identifying Problem SQL Statements
Using SQL Performance Analyzer
Tuning Memory (SGA and PGA)
Parallel Execution and Compression
Oracle Database 12c Performance New Features
Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark's built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we'll take a deep dive into the Cosco architecture and describe how it's deployed at Facebook. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
Oracle Week 2017 slides.
Agenda:
Basics: How and What To Tune?
Using the Automatic Workload Repository (AWR)
Using AWR-Based Tools: ASH, ADDM
Real-Time Database Operation Monitoring (12c)
Identifying Problem SQL Statements
Using SQL Performance Analyzer
Tuning Memory (SGA and PGA)
Parallel Execution and Compression
Oracle Database 12c Performance New Features
Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
Re-imagine Data Monitoring with whylogs and SparkDatabricks
In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra.
We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
Zeus is an efficient, highly scalable and distributed shuffle as a service which is powering all Data processing (Spark and Hive) at Uber. Uber runs one of the largest Spark and Hive clusters on top of YARN in industry which leads to many issues such as hardware failures (Burn out Disks), reliability and scalability challenges.
Capacity Planning For Your Growing MongoDB ClusterMongoDB
Your MongoDB deployment is growing, but are you prepared for that growth? Capacity planning is an essential practice when deploying any database system. You need to understand your usage patterns and determine the appropriate hardware based on your application's needs. Scaling reads and scaling writes will require different types of resources. With the proper tools in place, you can understand your working set, gain visibility into when it's time to add resources or start sharding and avoid performance issues. In this session, you'll learn how to use MongoDB Management Service and other tools to identify patterns and predict growth, ensuring your success with MongoDB.
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
What to Expect From Oracle database 19cMaria Colgan
The Oracle Database has recently switched to an annual release model. Oracle Database 19c is only the second release in this new model. So what can you expect from the latest version of the Oracle Database? This presentation explains how Oracle Database 19c is really 12.2.0.3 the terminal release of the 12.2 family and the new features you can find in this release.
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job.
Joram Barrez and Tijs Rademakers, Principal Software Engineer at Flowable present the current state of (Flowable)things.
It was presented at the Flowfest 2018 in Barcelona, Spain
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...HostedbyConfluent
We built Apache Pinot - a real-time distributed OLAP datastore - for low-latency analytics at scale. This is heavily used at companies such as LinkedIn, Uber, Slack, where Kafka serves as the backbone for capturing vast amounts of data. Pinot ingests millions of events per sec from Kafka, builds indexes in real-time and serves 100K+ queries per second while ensuring latency SLA of millisecond to sub second.
In the first implementation, we used the Consumer Group feature to manage the offsets and checkpoints across multiple Kafka Consumers. However, to achieve fault tolerance and scalability, we had to run multiple consumer groups for the same topic. This was our initial strategy to maintain the SLA at high query workload. But this model posed other challenges - since Kafka maintains offset per consumer group, achieving data consistency across multiple consumer groups was not possible. Also, a failure of a single node in a consumer group meant the entire consumer group was unavailable for query processing. Restarting the failed node needed lot of manual operations to ensure data is consumed exactly once. This resulted in management overhead and inefficient hardware utilization.
While taking inspiration from the Kafka consumer group implementation, we redesigned the real-time consumption in Pinot to maintain consistent offset across multiple consumer groups. This allowed us to guarantee consistent data across all replicas. This enabled us to copy data from another consumer group during node addition, node failure or increasing the replication group.
In this talk, we will deep dive into the various challenges faced and considerations that went into this design, and learn what makes Pinot resilient to failures both in Kafka Brokers and Pinot Components. We will introduce the new concept of ""lockstep"" sequencing where multiple consumer groups can synchronize checkpoints periodically and maintain consistency. We'll describe how we achieve this while maintaining strict freshness SLAs, and also withstanding high throughput and ingestion.
Oracle Database Migration to Oracle Cloud InfrastructureSinanPetrusToma
This slide deck highlights the benefits of Oracle Cloud, describes the different Oracle database cloud services and their characteristics, which one to choose and what to consider, and more than 20 methods and solutions Oracle offers to migrate Oracle databases across platforms.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
In Spark SQL the physical plan provides the fundamental information about the execution of the query. The objective of this talk is to convey understanding and familiarity of query plans in Spark SQL, and use that knowledge to achieve better performance of Apache Spark queries. We will walk you through the most common operators you might find in the query plan and explain some relevant information that can be useful in order to understand some details about the execution. If you understand the query plan, you can look for the weak spot and try to rewrite the query to achieve a more optimal plan that leads to more efficient execution.
The main content of this talk is based on Spark source code but it will reflect some real-life queries that we run while processing data. We will show some examples of query plans and explain how to interpret them and what information can be taken from them. We will also describe what is happening under the hood when the plan is generated focusing mainly on the phase of physical planning. In general, in this talk we want to share what we have learned from both Spark source code and real-life queries that we run in our daily data processing.
Silicon Valley NoSQL Meetup - Nov 2012. View with animations: video version here: https://vimeo.com/54691785
http://www.meetup.com/Silicon-Valley-NoSQL/events/88257222/
For more information visit: www.couchbase.com
Re-imagine Data Monitoring with whylogs and SparkDatabricks
In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra.
We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
Zeus is an efficient, highly scalable and distributed shuffle as a service which is powering all Data processing (Spark and Hive) at Uber. Uber runs one of the largest Spark and Hive clusters on top of YARN in industry which leads to many issues such as hardware failures (Burn out Disks), reliability and scalability challenges.
Capacity Planning For Your Growing MongoDB ClusterMongoDB
Your MongoDB deployment is growing, but are you prepared for that growth? Capacity planning is an essential practice when deploying any database system. You need to understand your usage patterns and determine the appropriate hardware based on your application's needs. Scaling reads and scaling writes will require different types of resources. With the proper tools in place, you can understand your working set, gain visibility into when it's time to add resources or start sharding and avoid performance issues. In this session, you'll learn how to use MongoDB Management Service and other tools to identify patterns and predict growth, ensuring your success with MongoDB.
ORC files were originally introduced in Hive, but have now migrated to an independent Apache project. This has sped up the development of ORC and simplified integrating ORC into other projects, such as Hadoop, Spark, Presto, and Nifi. There are also many new tools that are built on top of ORC, such as Hive’s ACID transactions and LLAP, which provides incredibly fast reads for your hot data. LLAP also provides strong security guarantees that allow each user to only see the rows and columns that they have permission for.
This talk will discuss the details of the ORC and Parquet formats and what the relevant tradeoffs are. In particular, it will discuss how to format your data and the options to use to maximize your read performance. In particular, we’ll discuss when and how to use ORC’s schema evolution, bloom filters, and predicate push down. It will also show you how to use the tools to translate ORC files into human-readable formats, such as JSON, and display the rich metadata from the file including the type in the file and min, max, and count for each column.
What to Expect From Oracle database 19cMaria Colgan
The Oracle Database has recently switched to an annual release model. Oracle Database 19c is only the second release in this new model. So what can you expect from the latest version of the Oracle Database? This presentation explains how Oracle Database 19c is really 12.2.0.3 the terminal release of the 12.2 family and the new features you can find in this release.
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job.
Joram Barrez and Tijs Rademakers, Principal Software Engineer at Flowable present the current state of (Flowable)things.
It was presented at the Flowfest 2018 in Barcelona, Spain
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...HostedbyConfluent
We built Apache Pinot - a real-time distributed OLAP datastore - for low-latency analytics at scale. This is heavily used at companies such as LinkedIn, Uber, Slack, where Kafka serves as the backbone for capturing vast amounts of data. Pinot ingests millions of events per sec from Kafka, builds indexes in real-time and serves 100K+ queries per second while ensuring latency SLA of millisecond to sub second.
In the first implementation, we used the Consumer Group feature to manage the offsets and checkpoints across multiple Kafka Consumers. However, to achieve fault tolerance and scalability, we had to run multiple consumer groups for the same topic. This was our initial strategy to maintain the SLA at high query workload. But this model posed other challenges - since Kafka maintains offset per consumer group, achieving data consistency across multiple consumer groups was not possible. Also, a failure of a single node in a consumer group meant the entire consumer group was unavailable for query processing. Restarting the failed node needed lot of manual operations to ensure data is consumed exactly once. This resulted in management overhead and inefficient hardware utilization.
While taking inspiration from the Kafka consumer group implementation, we redesigned the real-time consumption in Pinot to maintain consistent offset across multiple consumer groups. This allowed us to guarantee consistent data across all replicas. This enabled us to copy data from another consumer group during node addition, node failure or increasing the replication group.
In this talk, we will deep dive into the various challenges faced and considerations that went into this design, and learn what makes Pinot resilient to failures both in Kafka Brokers and Pinot Components. We will introduce the new concept of ""lockstep"" sequencing where multiple consumer groups can synchronize checkpoints periodically and maintain consistency. We'll describe how we achieve this while maintaining strict freshness SLAs, and also withstanding high throughput and ingestion.
Oracle Database Migration to Oracle Cloud InfrastructureSinanPetrusToma
This slide deck highlights the benefits of Oracle Cloud, describes the different Oracle database cloud services and their characteristics, which one to choose and what to consider, and more than 20 methods and solutions Oracle offers to migrate Oracle databases across platforms.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
In Spark SQL the physical plan provides the fundamental information about the execution of the query. The objective of this talk is to convey understanding and familiarity of query plans in Spark SQL, and use that knowledge to achieve better performance of Apache Spark queries. We will walk you through the most common operators you might find in the query plan and explain some relevant information that can be useful in order to understand some details about the execution. If you understand the query plan, you can look for the weak spot and try to rewrite the query to achieve a more optimal plan that leads to more efficient execution.
The main content of this talk is based on Spark source code but it will reflect some real-life queries that we run while processing data. We will show some examples of query plans and explain how to interpret them and what information can be taken from them. We will also describe what is happening under the hood when the plan is generated focusing mainly on the phase of physical planning. In general, in this talk we want to share what we have learned from both Spark source code and real-life queries that we run in our daily data processing.
Silicon Valley NoSQL Meetup - Nov 2012. View with animations: video version here: https://vimeo.com/54691785
http://www.meetup.com/Silicon-Valley-NoSQL/events/88257222/
For more information visit: www.couchbase.com
MySQL vs. NoSQL and NewSQL - survey resultsMatthew Aslett
The results of 451 Research's survey into the competitive dynamic between MySQL, NoSQL, and New SQL database technologies.
Further details at: http://blogs.the451group.com/information_management/?p=1740
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Introduction to Database Benchmarking with Benchmark FactoryMichael Micalizzi
When you are faced with making changes to your database environment such as upgrades, patches, hardware installation or virtual machine implementations, there are inherent risks that database performance or availability can suffer.
Oracle ACE and author Bert Scalzo will show how to reduce these risks and demonstrate how Benchmark Factory can help you scale your database for best possible results.
Greg Smith
Cover how to use simple low-level tools such as memtest86, dd, bonnie++, and sysbench to benchmark the hardware of a server intended for database use. A heavy dose of vendor management suggestions will be included as well, for the inevitable time when your shiny new server fails to deliver the performance it should.
On Sep 21, 2012, Renat Khasanshyn, CEO @ Altoros, made a session “Benchmarking Couchbase Server” at CouchConf that was held in San Francisco. In his session, Renat highlighted that all NoSQL vendors say their databases are fast and scalable, but this is not really helpful for end users.
La 5ème édition du Meetup de la Voiture Connectée à Paris, en partenariat avec IBM France et Eiver.
Le 30 Novembre 2016, au Square Paris, le nouveau lab digital de Renault.
Au programme:
-Eiver: la bonne conduite enfin récompensée
-Karos: le court-voiturage enfin possible
-Waynote: l'autoroute est un voyage
Les Meetups Voiture Connectée et Autonome vous sont proposés par Laurent Dunys, https://www.linkedin.com/in/laurentdunys, depuis 2016.
Rejoignez notre groupe en ligne: https://www.meetup.com/fr-FR/MeetupVoitureConnecteeAutonome
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase CreateStorage Switzerland
NoSQL databases like Cassandra and Couchbase are quickly becoming key components of the modern IT infrastructure. But this modernization creates new challenges – especially for storage. Storage in the broad sense. In-memory databases perform well when there is enough memory available. However, when data sets get too large and they need to access storage, application performance degrades dramatically. Moreover, even if enough memory is available, persistent client requests can bring the servers to their knees.
Join Storage Switzerland and Plexistor where you will learn:
1. What is Cassandra and Couchbase?
2. Why organizations are adopting them?
3. What are the storage challenges they create?
4. How organizations attempt to workaround these challenges.
5. How to design a solution to these challenges instead of a workaround.
Building an Analytics Engine on MongoDB to Revolutionize AdvertisingMongoDB
Yallzi is building an analytics engine to create the 4th dimension of advertising. Join our journey! We’ll review how we made our decisions on technology, schema, architecture, aggregation framework and more as well as how MongoDB allows us to iterate quickly -- which is critical for our bootstrapped startup.
How We Use MongoDB in Our Advertising SystemMongoDB
This talk will go over why we chose to use MongoDB for storing billions of documents with only 3 replset nodes, and why we choose MongoDB for our report data store instead of MySQL.
A short introduction of list and show handling functions in couchdb. Examples are taken from the book "CouchDB mit PHP".
Talk was given at the berlin couchdb meetup (http://berlin.couchdb.org)
Edge 2016 SCL-2484: a software defined scalable and flexible container manage...Yong Feng
The material for IBM Edge 2016 session for Spectrum Container Management Solution.
https://www-01.ibm.com/events/global/edge/sessions/.
Please refer to http://ibm.biz/ConductorForContainers for more details about Spectrum Conductor for Containers.
Please refer to https://www.youtube.com/watch?v=7YMjP6EypqA and https://www.youtube.com/watch?v=d9oVPU3rwhE for the demo of Spectrum Conductor for Containers.
Rightscale Webinar: Building Blocks for Private and Hybrid CloudsRightScale
Looking for some solid guidance to help build your private or hybrid cloud? Want to turn your existing data center into a private cloud? Or perhaps you want to integrate your private cloud with a public cloud, but you’re not sure where to get started.
In this webinar you'll learn the key considerations for building a private or hybrid cloud, presented by the pros at RightScale who help our customers do this every single day.
We’ll discuss:
- Selecting hardware: How to decide which compute, networking and storage options to select.
- Private cloud considerations such as workload and infrastructure interaction, security, latency, user experience, and cost.
-Reference architectures and design considerations such as the location of physical hardware and configuration for availability and redundancy.
- Use cases and real-life scenarios: Private and hybrid clouds are especially well-suited for scalable applications with uncertain demand, disaster recovery and self-service IT portals.
- How to select the cloud solution provider that’s right for you, and how to manage your cloud resources effectively.
You’ll leave this webinar with a thorough understanding of building blocks for private and hybrid clouds.
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnikbiz
A Technical introduction to PostgreSQL and Postgres Plus -
Enterprise Class PostgreSQL Database from EDB - You have a ‘Real’ alternative to Oracle and other conventional proprietary Databases
Consolidation Planning: Getting the Most from Your Virtualization InitiativeNovell
During this session, you'll get an in-depth look at the principles and best practices of planning a server consolidation project. You will learn how to ensure projects meet goals such as minimizing new hardware purchases or rack space, how and when to use different strategies such as scale up versus scale out, and how to prove those goals were met after the project is finished.
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...Lightbend
**Featuring Aaron Williams, Head of Advocacy at Mesosphere, Inc. and Markus Eisele, Developer Advocate at Lightbend, Inc.**
The traditional architecture that enterprises run their businesses on has typically been delivered as monolithic applications running in a virtualized, on-premise infrastructure. Public and private cloud technologies have changed everything, but if the applications are not designed, or re-designed, appropriately, then it is impossible to take advantage of the advances in both distributed application services and hybrid infrastructure. Consequently, enterprise architects are looking to microservices-based architectures as a means to modernize their legacy applications.
This webinar with Lightbend and partner Mesosphere will introduce a new framework specifically designed to help developers modernize legacy Java EE applications into systems of microservices and then discuss exactly what is required to run these distributed systems at enterprise scale.
Building Blocks for Private and Hybrid CloudsRightScale
Learn key considerations about building a private or hybrid cloud, including selecting hardware, cloud infrastructure software, hosting vendors, systems integrators, and reference architectures.
Following simple patterns of good application design can allow you to scale your application for your customers easily. This presentation dives into the 12 factor application design and demo how this applies to containers and deployments on Amazon ECS and Fargate. We'll take a look at tooling that can be used to simplify your workflow and help you adopt the principles of the 12 factor application.
Review this AWS and Nimbo webinar where we discuss moving your data center to the AWS Cloud. We feature a real world example to illustrate how this can be achieved both quickly and smoothly.
Hess Corporation recently moved part of its infrastructure to the cloud, to prepare for a business divestiture. Relying on consultation from enterprise cloud solution provider Nimbo, the migration was completed securely, in about half the time it would have taken in an on-premises environment.
Covers the problems of achieving scalability in server farm environments and how distributed data grids provide in-memory storage and boost performance. Includes summary of ScaleOut Software product offerings including ScaleOut State Server and Grid Computing Edition.
How to you manage Performance in the Cloud, in particular in "Platform as a Service (PaaS) environments like Window's Azure or Heroku where you don't have a "virtual machine" to manage?
Even in "Infrastructure as a Service (IaaS)" environments like Amazon EC2 there are limitations on the tools you can deploy into that environment to assist in performance management, troubleshooting etc (e.g. you can't deploy promiscuous mode network sniffing tools in EC2).
James Smith from Adactus will give us an overview of Cloud Services as a whole, and then drill down into some of the issues they have experienced in deployed their "Pulse" Claims Management Solution into the Azure cloud (http://www.pulseclaims.com/home).
Beyond just looking at page speed performance he'll talk about the challenges involved in managing SLA's, Cloud "support" (or lack of it!), performance troubleshooting and the whole "performance lifecycle".
Increasing industry leaders are moving to AWS because of its speed, flexibility, and potential cost saving benefits.
Cloud computing means no hardware to buy and no infrastructure to maintain. There are no complex SANs to misconfigure and limited human error due to increased automation.
But, where do you start? How can you secure and protect your IP? Who makes sure that your system keeps up with development?
Perforce and Assembla are Cloud Ready
For enterprises looking for performance, scale, reliability, and security – Helix Core supports your development workloads. We can secure your mission-critical data while leveraging the public cloud.
Determining the right solution and getting setup involves accessing your current needs and preparing for your future. Learn from our consultation experts. You’ll take-away:
-Configuration best practices from working with industry leaders
-Optimization strategies for your cloud investment
-Resources for ongoing day-to-day management
-Tips to future-proof your cloud
We can optimize your solution with managed services. Both Perforce Professional Services Team and Assembla have helped Helix Core users get set up on AWS. So if you’re not interested in setting this up yourself, or don’t have the resources for day-to-day management, we have solutions.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.