Video: https://youtu.be/RS8x73z4csI
What is middleware?
Mixins, Reference pollution and shared state, Race condition and Abstraction leaks, Fat controller and layers mix, High coupling and Error ignoring
CQRS and Event Sourcing, An Alternative Architecture for DDDDennis Doomen
Most of us will be familiar with the standard 3- or 4-layer architecture you often see in larger enterprise systems. Some are already practicing Domain Driven Design and work together with the business to clarify the domain concepts. Perhaps you’ve noticed that is difficult to get the intention of the 'verbs' from that domain into this standard architecture. If performance is an important requirement as well, then you might have discovered that an Object-Relational Mapper and a relational database are not always the best solution.
One of the main reasons for this is the fact that the interests of a consistent domain that takes into account the many business rules, and those of data reporting and presentation are conflicting. That’s why Betrand Meyer introduced the Command Query Separation principle.
An architecture based on this principle combined with the Event Sourcing concept provides the ideal architecture for building high-performance systems designed using DDD. Well-known bloggers like Udi Dahan and Greg Young have already spent quite a lot of of posts on this, and this year’s Developer Days had some coverage as well.
But how do you build such a system with the. NET framework? Is it really as complex as some claim, or is just different work?
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
This talk is focused on tuning analysing and optimizing MongoDB query and index with the use of Database Profiler and "explain()" function.
Also, performance of database can also be impacted by configuring the underline ( Linux ) OS with some recommended settings which do not come by default.
Video: https://youtu.be/RS8x73z4csI
What is middleware?
Mixins, Reference pollution and shared state, Race condition and Abstraction leaks, Fat controller and layers mix, High coupling and Error ignoring
CQRS and Event Sourcing, An Alternative Architecture for DDDDennis Doomen
Most of us will be familiar with the standard 3- or 4-layer architecture you often see in larger enterprise systems. Some are already practicing Domain Driven Design and work together with the business to clarify the domain concepts. Perhaps you’ve noticed that is difficult to get the intention of the 'verbs' from that domain into this standard architecture. If performance is an important requirement as well, then you might have discovered that an Object-Relational Mapper and a relational database are not always the best solution.
One of the main reasons for this is the fact that the interests of a consistent domain that takes into account the many business rules, and those of data reporting and presentation are conflicting. That’s why Betrand Meyer introduced the Command Query Separation principle.
An architecture based on this principle combined with the Event Sourcing concept provides the ideal architecture for building high-performance systems designed using DDD. Well-known bloggers like Udi Dahan and Greg Young have already spent quite a lot of of posts on this, and this year’s Developer Days had some coverage as well.
But how do you build such a system with the. NET framework? Is it really as complex as some claim, or is just different work?
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
We have covered the need for CDC and the benefits of building a CDC pipeline. We will compare various CDC streaming and reconciliation frameworks. We will also cover the architecture and the challenges we faced while running this system in the production. Finally, we will conclude the talk by covering Apache Hudi, Schema Registry and Debezium in detail and our contributions to the open-source community.
This talk is focused on tuning analysing and optimizing MongoDB query and index with the use of Database Profiler and "explain()" function.
Also, performance of database can also be impacted by configuring the underline ( Linux ) OS with some recommended settings which do not come by default.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
Webinar: MongoDB Schema Design and Performance ImplicationsMongoDB
In this session, you will learn how to translate one-to-one, one-to-many and many-to-many relationships, and learn how MongoDB's JSON structures, atomic updates and rich indexes can influence your design. We will also explore implications of storage engines, indexing and query patterns, available tools and related new features in MongoDB 3.2.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.
Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.
For more information, visit our website at http://casertaconcepts.com/ or email us at info@casertaconcepts.com.
Speaker: Jerry Reghunadh, Architect, CAPIOT Software Pvt. Ltd.
Level: 200 (Intermediate)
Track: Microservices
One of the leading assisted e-commerce players in India approached CAPIOT to rebuild their ERP system from the ground up. Their existing PHP-MySQL setup, while rich in functionality and having served them well for under half a decade, would not scale to meet future demands due to the exponential grown they were experiencing.
We built the entire system using a microservices architecture. To develop APIs we used Node.js, Express, Swagger and Mongoose, and MongoDB was used as the active data store. During the development phase, we solved several problems ranging from cross-service calls, data consistency, service discovery, and security.
One of the issues that we faced is how to effectively design and make cross-service calls. Should we implement a cross-service call for every document that we require or should we duplicate and distribute the data, reducing cross-service calls? We found a balance between these two and engineered a solution that gave us good performance.
In addition, our current system has 36 independent services. We enabled services to auto-discover and make secure calls.
We used Swagger to define our APIs first and enforce request and response validations and Mongoose as our ODM for schema validation. We also heavily depend on pre-save hooks to validate data and post-save hooks to trigger changes in other systems. This API-driven approach vastly enabled our frontend and backend teams to scrum together on a single API spec without worrying about the repercussions of changing API schemas.
What You Will Learn:
- How we used Swagger and Mongoose to off-load validations and schema enforcements. We used Swagger to define our APIs first and enforce request and response validations and Mongoose as our ODM for schema validation. We also heavily depend on pre-save hooks to validate data and post-save hooks to trigger changes in other systems. This API-driven approach vastly enabled our frontend and backend teams to scrum together on a single API spec without worrying about the repercussions of changing API schemas.
- How microservices and cross-service calls work. One of the issues that we faced is how to effectively design and make cross-service calls. Should we implement a cross-service call for every document that we require or should we duplicate and distribute the data, reducing cross-service calls? We found a balance between these two and engineered a solution that gave us good performance.
- How we implemented microservice auto discovery: Our current system has 36 independent services, so we enabled services to auto-discover and make secure calls.
Webinar: Working with Graph Data in MongoDBMongoDB
With the release of MongoDB 3.4, the number of applications that can take advantage of MongoDB has expanded. In this session we will look at using MongoDB for representing graphs and how graph relationships can be modeled in MongoDB.
We will also look at a new aggregation operation that we recently implemented for graph traversal and computing transitive closure. We will include an overview of the new operator and provide examples of how you can exploit this new feature in your MongoDB applications.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
With Lakehouse as the future of data architecture, Delta becomes the de facto data storage format for all the data pipelines. By using delta, to build the curated data lakes, users achieve efficiency and reliability end-to-end. Curated data lakes involve multiple hops in the end-to-end data pipeline, which are executed regularly (mostly daily) depending on the need. As data travels through each hop, its quality improves and becomes suitable for end-user consumption. On the other hand real-time capabilities are key for any business and an added advantage, luckily Delta has seamless integration with structured streaming which makes it easy for users to achieve real-time capability using Delta. Overall, Delta Lake as a streaming source is a marriage made in heaven for various reasons and we are already seeing the rise in adoption among our users.
In this talk, we will discuss various functional components of structured streaming with Delta as a streaming source. Deep dive into Query Progress Logs(QPL) and their significance for operating streams in production. How to track the progress of any streaming job and map it with the source Delta table using QPL. What exactly gets persisted in the checkpoint directory and its details. Mapping the contents of the checkpoint directory with the QPL metrics and understanding the significance of contents in the checkpoint directory with respect to Delta streams.
Simplifying Disaster Recovery with Delta LakeDatabricks
There’s a need to develop a recovery process for Delta table in a DR scenario. Cloud multi-region sync is Asynchronous. This type of replication does not guarantee the chronological order of files at the target (DR) region. In some cases, we can expect large files to arrive later than small files.
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
This talk outlines data lake design patterns that can yield massive performance gains for all downstream consumers. We will talk about how to optimize Parquet data lakes and the awesome additional features provided by Databricks Delta. * Optimal file sizes in a data lake * File compaction to fix the small file problem * Why Spark hates globbing S3 files * Partitioning data lakes with partitionBy * Parquet predicate pushdown filtering * Limitations of Parquet data lakes (files aren't mutable!) * Mutating Delta lakes * Data skipping with Delta ZORDER indexes
Speaker: Matthew Powers
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
This talk will break down merge in Delta Lake—what is actually happening under the hood—and then explain about how you can optimize a merge. There are even some code snippet and sample configs that will be shared.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
Webinar: MongoDB Schema Design and Performance ImplicationsMongoDB
In this session, you will learn how to translate one-to-one, one-to-many and many-to-many relationships, and learn how MongoDB's JSON structures, atomic updates and rich indexes can influence your design. We will also explore implications of storage engines, indexing and query patterns, available tools and related new features in MongoDB 3.2.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.
Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.
For more information, visit our website at http://casertaconcepts.com/ or email us at info@casertaconcepts.com.
Speaker: Jerry Reghunadh, Architect, CAPIOT Software Pvt. Ltd.
Level: 200 (Intermediate)
Track: Microservices
One of the leading assisted e-commerce players in India approached CAPIOT to rebuild their ERP system from the ground up. Their existing PHP-MySQL setup, while rich in functionality and having served them well for under half a decade, would not scale to meet future demands due to the exponential grown they were experiencing.
We built the entire system using a microservices architecture. To develop APIs we used Node.js, Express, Swagger and Mongoose, and MongoDB was used as the active data store. During the development phase, we solved several problems ranging from cross-service calls, data consistency, service discovery, and security.
One of the issues that we faced is how to effectively design and make cross-service calls. Should we implement a cross-service call for every document that we require or should we duplicate and distribute the data, reducing cross-service calls? We found a balance between these two and engineered a solution that gave us good performance.
In addition, our current system has 36 independent services. We enabled services to auto-discover and make secure calls.
We used Swagger to define our APIs first and enforce request and response validations and Mongoose as our ODM for schema validation. We also heavily depend on pre-save hooks to validate data and post-save hooks to trigger changes in other systems. This API-driven approach vastly enabled our frontend and backend teams to scrum together on a single API spec without worrying about the repercussions of changing API schemas.
What You Will Learn:
- How we used Swagger and Mongoose to off-load validations and schema enforcements. We used Swagger to define our APIs first and enforce request and response validations and Mongoose as our ODM for schema validation. We also heavily depend on pre-save hooks to validate data and post-save hooks to trigger changes in other systems. This API-driven approach vastly enabled our frontend and backend teams to scrum together on a single API spec without worrying about the repercussions of changing API schemas.
- How microservices and cross-service calls work. One of the issues that we faced is how to effectively design and make cross-service calls. Should we implement a cross-service call for every document that we require or should we duplicate and distribute the data, reducing cross-service calls? We found a balance between these two and engineered a solution that gave us good performance.
- How we implemented microservice auto discovery: Our current system has 36 independent services, so we enabled services to auto-discover and make secure calls.
Webinar: Working with Graph Data in MongoDBMongoDB
With the release of MongoDB 3.4, the number of applications that can take advantage of MongoDB has expanded. In this session we will look at using MongoDB for representing graphs and how graph relationships can be modeled in MongoDB.
We will also look at a new aggregation operation that we recently implemented for graph traversal and computing transitive closure. We will include an overview of the new operator and provide examples of how you can exploit this new feature in your MongoDB applications.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
With Lakehouse as the future of data architecture, Delta becomes the de facto data storage format for all the data pipelines. By using delta, to build the curated data lakes, users achieve efficiency and reliability end-to-end. Curated data lakes involve multiple hops in the end-to-end data pipeline, which are executed regularly (mostly daily) depending on the need. As data travels through each hop, its quality improves and becomes suitable for end-user consumption. On the other hand real-time capabilities are key for any business and an added advantage, luckily Delta has seamless integration with structured streaming which makes it easy for users to achieve real-time capability using Delta. Overall, Delta Lake as a streaming source is a marriage made in heaven for various reasons and we are already seeing the rise in adoption among our users.
In this talk, we will discuss various functional components of structured streaming with Delta as a streaming source. Deep dive into Query Progress Logs(QPL) and their significance for operating streams in production. How to track the progress of any streaming job and map it with the source Delta table using QPL. What exactly gets persisted in the checkpoint directory and its details. Mapping the contents of the checkpoint directory with the QPL metrics and understanding the significance of contents in the checkpoint directory with respect to Delta streams.
Simplifying Disaster Recovery with Delta LakeDatabricks
There’s a need to develop a recovery process for Delta table in a DR scenario. Cloud multi-region sync is Asynchronous. This type of replication does not guarantee the chronological order of files at the target (DR) region. In some cases, we can expect large files to arrive later than small files.
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
This talk outlines data lake design patterns that can yield massive performance gains for all downstream consumers. We will talk about how to optimize Parquet data lakes and the awesome additional features provided by Databricks Delta. * Optimal file sizes in a data lake * File compaction to fix the small file problem * Why Spark hates globbing S3 files * Partitioning data lakes with partitionBy * Parquet predicate pushdown filtering * Limitations of Parquet data lakes (files aren't mutable!) * Mutating Delta lakes * Data skipping with Delta ZORDER indexes
Speaker: Matthew Powers
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
This talk will break down merge in Delta Lake—what is actually happening under the hood—and then explain about how you can optimize a merge. There are even some code snippet and sample configs that will be shared.
Почему хорошее ИТ-образование невостребовано рыночкомTimur Shemsedinov
Выступление на конференции Глушков Fest в Киевском Национальном Университете имени Тараса Шевченко на Факультете Кибернетики. 29 ноября 2019
Видео: https://youtu.be/nvIJE6xMpiI
Структура та архітектура програмних систем
Комітет АПУ з питань телекомунікацій, інформаційних технологій та Інтернету запрошує вас взяти участь у другомузаході третього сезону проекту «HowdoesITwork?», присвяченого структурі та архітектурі програмних систем.
Про що будемо говорити?
- Що таке мова програмування, компілятор, транслятор, класифікація мов програмування, які є мови програмування та сфери їх використання.
- Які є програмні компоненти: що таке фрейморк, бібліотека, модуль, клас, репозиторій, та як вони застосовуються в процесі розробки.
- Що таке середовище розробки, IDE, лінтер, CI/CD, та інші засоби та інфраструктурні компоненти розробки.
- Архітектура програмних рішень, клієнт-серверні, багатошарові, монолітні сервери, бекенд та фронтенд, сервісний підхід, мікросервіси, контейнери, хмарні технології.
- Особливості використання Open source коду при створенні програмних систем за різними ліцензіями та безпека використання відкритого коду.
- Організація процесу розробки, надійність, якість, ревью кода, рефакторінг, системи контролю версій, володіння кодом та bus-factor
Спікер:
Тимур Шемседінов, архітектор технологічного стеку та лідер спільноти Метархія, викладач КПІ, 2й у Github рейтингу розробників України, керівник R&D по створенню високонавантажених хмарних технологій.
Conference: JS is Back! fwdays
In addition to blocking CPU-bound tasks, we can use worker_threads to run asynchronous non-blocking domain logic if our main event loop is already overloaded. Now we have perf_hooks API which can be used to create load balancer to spread load in worker pool to provide the most dense event loop utilization. Mentioned machinery can be encapsulated in a simple abstraction with async/await contract to have a solution competing with goroutines in Go for both convenience and efficiency.
Node.js Меньше сложности, больше надежности Holy.js 2021Timur Shemsedinov
If Node.js is your everyday tool, it's almost certain that you use it in the wrong way, Timur will prove that in a very short review, uncover anti-patterns in your daily standard solutions, and show you the way to much better practices. The only thing that creates obstacles in your way to knowledge is your laziness.
Low-code sells great, but in practice, it does not provide the benefits that vendors have claimed. What are the reasons and how can we get an advantage using the Low-code principle? Experience of radical rethinking and use-cases in enterprise applications together with multi-paradigm programming and metaprogramming.
https://fwdays.com/en/event/architecture-fwdays-2021/review/rethinking-low-code
Мы закончим обзор новых возможностей Node.js и сложив все это вместе в Node.js Starter Kit (шаблона проекта) от сообщества Metarhia для построения надежных и масштабируемых облачных и кластерных приложений и быстрой разработки API для высоконагруженных и интерактивных систем. Будет опубликован манифест Metaserverless. Мы разберем код, обсудим использование новейших возможностей платформы Node.js и фундаментальных знаний CS для построения грамотной структуры и архитектуры проекта.
Видео: https://youtu.be/r1u-dGocm1c
План 2-го вебинара: Обзор распространенных проблем: утечки памяти и ресурсов, игнорирование ошибок и потенциальных мест их появления, нарушение принципов GRASP и SOLID в Node.js, понятия связанности и зацепления программных компонентов, применение GoF паттернов и других шаблонов проектирования, обзор антипаттернов в Node.js и как это должно влиять на написание ежедневного кода.
Fwdays вединар: Node.js in 2020: Выйди и зайди нормально - Часть 1
Видео: https://youtu.be/GJY2dyE6328?t=480
За последние 5 лет Node.js очень изменился, но знания о платформе у сообщества остались на уровне 2013-2015 годов, все те же подходы, все те же проблемы. Сообщество плохо следит за новыми возможности, а если и узнает про них, то это не влияет на написание ежедневного кода. В Node.js, да и в JavaScript, слабо проникают фундаментальные знания по программной инженерии и архитектуре, параллельному программированию, GRASP, SOLID, GoF, а если и проникают, то не подвергаются адаптации и переосмыслению. Поэтому, среди других языков программирования JavaScript воспринимается, как несерьезный, а Node.js, как платформа для малограмотных людей. Как преодолеть эту тенденцию и как изменить подход к разработке на Node.js в 2020 году, с использованием всех современных возможностей и знаний, а так же, что нужно изменить в ежедневных практиках написания кода, эти и другие вопросы будут освещены в докладе «Node.js в 2020: Выйди и зайди нормально».
8. const fs = require('fs'); const compose = (...funcs) => x => funcs.
reduce((x, fn) => fn(x), x); const DENSITY_COL = 3; const renderTab
table => { const cellWidth = [18, 10, 8, 8, 18, 6]; return table.ma
=> (row.map((cell, i) => { const width = cellWidth[i]; return i ? c
Типы систем
OLAP — (Online analytical processing)
агрегированные данные для задач аналитики.
OLTP — Online Transaction Processing
обработка транзакций в реальном времени.
9. const fs = require('fs'); const compose = (...funcs) => x => funcs.
reduce((x, fn) => fn(x), x); const DENSITY_COL = 3; const renderTab
table => { const cellWidth = [18, 10, 8, 8, 18, 6]; return table.ma
=> (row.map((cell, i) => { const width = cellWidth[i]; return i ? c
Большие данные
Big data (большие данные): объем, прирост,
многообразие.
Data warehouse (хранилища) — данные только
для чтения, не транзакционные, для анализа.
Data lake (Озеро данных) — хранилище
большого объема неструктурированных
данных.