Slides for my talk at Ruby Ireland on 10 May 11. Showing some of the capabilities of mongoDB, using it from a Sinatra applications and deploying it to Heroku and Cloud Foundry
"In this session, Twitter engineer Alex Payne will explore how the popular social messaging service builds scalable, distributed systems in the Scala programming language. Since 2008, Twitter has moved the development of its most critical systems to Scala, which blends object-oriented and functional programming with the power, robust tooling, and vast library support of the Java Virtual Machine. Find out how to use the Scala components that Twitter has open sourced, and learn the patterns they employ for developing core infrastructure components in this exciting and increasingly popular language."
Transactional writes to cloud storage with Eric LiangDatabricks
We will discuss the three dimensions to evaluate HDFS to S3: cost, SLAs (availability and durability), and performance. He then provided a deep dive on the challenges in writing to Cloud storage with Apache Spark and shared transactional commit benchmarks on Databricks I/O (DBIO) compared to Hadoop.
Empowering developers to deploy their own data storesTomas Doran
Empowering developers to deploy their own data stores using Terrafom, Puppet and rage. A talk about automating server building and configuration for Elasticsearch clusters, using Hashicorp and puppet labs tool. Presented at Config Management Camp 2016 in Ghent
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...Spark Summit
Due to Spark, writing big data applications has never been easier…at least until they stop being easy! At Lightbend we’ve helped our customers out of a number of hidden Spark pitfalls. Some crop up often; the ever-persistent OutOfMemoryError, the confusing NoSuchMethodError, shuffle and partition management, etc. Others occur less frequently; an obscure configuration affecting SQL broadcasts, struggles with speculating, a failing stream recovery due to RDD joins, S3 file reading leading to hangs, etc. All are intriguing! In this session we will provide insights into their origins and show how you can avoid making the same mistakes. Whether you are a seasoned Spark developer or a novice, you should learn some new tips and tricks that could save you hours or even days of debugging.
An over-ambitious introduction to Spark programming, test and deployment. This slide tries to cover most core technologies and design patterns used in SpookyStuff, the fastest query engine for data collection/mashup from the deep web.
For more information please follow: https://github.com/tribbloid/spookystuff
A bug in PowerPoint used to cause transparent background color not being rendered properly. This has been fixed in a recent upload.
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
This was a talk that Kelvin Chu and I just gave at the SF Bay Area Spark Meetup 5/14 at Palantir Technologies.
We discussed the Spark Job Server (http://github.com/ooyala/spark-jobserver), its history, example workflows, architecture, and exciting future plans to provide HA spark job contexts.
We also discussed the use case of the job server at Ooyala to facilitate fast query jobs using shared RDD and a shared job context, and how we integrate with Apache Cassandra.
"In this session, Twitter engineer Alex Payne will explore how the popular social messaging service builds scalable, distributed systems in the Scala programming language. Since 2008, Twitter has moved the development of its most critical systems to Scala, which blends object-oriented and functional programming with the power, robust tooling, and vast library support of the Java Virtual Machine. Find out how to use the Scala components that Twitter has open sourced, and learn the patterns they employ for developing core infrastructure components in this exciting and increasingly popular language."
Transactional writes to cloud storage with Eric LiangDatabricks
We will discuss the three dimensions to evaluate HDFS to S3: cost, SLAs (availability and durability), and performance. He then provided a deep dive on the challenges in writing to Cloud storage with Apache Spark and shared transactional commit benchmarks on Databricks I/O (DBIO) compared to Hadoop.
Empowering developers to deploy their own data storesTomas Doran
Empowering developers to deploy their own data stores using Terrafom, Puppet and rage. A talk about automating server building and configuration for Elasticsearch clusters, using Hashicorp and puppet labs tool. Presented at Config Management Camp 2016 in Ghent
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...Spark Summit
Due to Spark, writing big data applications has never been easier…at least until they stop being easy! At Lightbend we’ve helped our customers out of a number of hidden Spark pitfalls. Some crop up often; the ever-persistent OutOfMemoryError, the confusing NoSuchMethodError, shuffle and partition management, etc. Others occur less frequently; an obscure configuration affecting SQL broadcasts, struggles with speculating, a failing stream recovery due to RDD joins, S3 file reading leading to hangs, etc. All are intriguing! In this session we will provide insights into their origins and show how you can avoid making the same mistakes. Whether you are a seasoned Spark developer or a novice, you should learn some new tips and tricks that could save you hours or even days of debugging.
An over-ambitious introduction to Spark programming, test and deployment. This slide tries to cover most core technologies and design patterns used in SpookyStuff, the fastest query engine for data collection/mashup from the deep web.
For more information please follow: https://github.com/tribbloid/spookystuff
A bug in PowerPoint used to cause transparent background color not being rendered properly. This has been fixed in a recent upload.
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
This was a talk that Kelvin Chu and I just gave at the SF Bay Area Spark Meetup 5/14 at Palantir Technologies.
We discussed the Spark Job Server (http://github.com/ooyala/spark-jobserver), its history, example workflows, architecture, and exciting future plans to provide HA spark job contexts.
We also discussed the use case of the job server at Ooyala to facilitate fast query jobs using shared RDD and a shared job context, and how we integrate with Apache Cassandra.
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
This talk describes the motivations behind Apache Gobblin (incubating), architecture, latest innovations in supporting both batch and streaming data pipelines as well as future roadmap.
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/227622666/
Title: Spark on Kubernetes
Abstract: Engineers across several organizations are working on support for Kubernetes as a cluster scheduler backend within Spark. While designing this, we have encountered several challenges in translating Spark to use idiomatic Kubernetes constructs natively. This talk is about our high level design decisions and the current state of our work.
Speaker:
Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. His focus is on running stateful and batch workloads. Previously, he worked on GGC (Google Global Cache) and prior to that, on the infrastructure team at NVIDIA."
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
Apache Flink is a world class stateful stream processor presents a huge variety of optional features and configuration choices to the user. Determining out the optimal choice for any production environment and use-case be challenging. In this talk, we will explore and discuss the universe of Flink configuration with respect to state and state backends.
We will start with a closer look under the hood, at core data structures and algorithms, to build the foundation for understanding the impact of tuning parameters and the costs-benefit-tradeoffs that come with certain features and options. In particular, we will focus on state backend choices (Heap vs RocksDB), tuning checkpointing (incremental checkpoints, ...) and recovery (local recovery), serializers and Apache Flink's new state migration capabilities.
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
In order to leverage the best performance characters of your data or stream backend, it is important to understand the nitty gritty details of how your backend store and compute works, how data is stored, how is it indexed and how the read path is. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this slide deck, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
The focus will be more on storage backend so as to not keep this tailored to pulsar specifically but to be able to apply it different data stores or streams.
Productionizing Spark and the Spark Job ServerEvan Chan
You won't find this in many places - an overview of deploying, configuring, and running Apache Spark, including Mesos vs YARN vs Standalone clustering modes, useful config tuning parameters, and other tips from years of using Spark in production. Also, learn about the Spark Job Server and how it can help your organization deploy Spark as a RESTful service, track Spark jobs, and enable fast queries (including SQL!) of cached RDDs.
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
We will introduce HerdDB a distributed database written in Java.
We will see how a distributed database can be built using Apache BookKeeper as write-ahead commit log.
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
The driving question behind redesigns of countless data collection architectures has often been, ?how can we make the data available to our analytical systems faster?? Increasingly, the go-to solution for this data collection problem is Apache Flume. In this talk, architectures and techniques for designing a low-latency Flume-based data collection and delivery system to enable Hadoop-based analytics are explored. Techniques for getting the data into Flume, getting the data onto HDFS and HBase, and making the data available as quickly as possible are discussed. Best practices for scaling up collection, addressing de-duplication, and utilizing a combination streaming/batch model are described in the context of Flume and Hadoop ecosystem components.
Iphone client-server app with Rails backend (v3)Sujee Maniyam
Some of the lessons learned from building a client-server iphone app (DiscountsForMe)
This is version 3 of the talk, presented at SF Ruby Meetup on Feb 17, 2010
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
This talk describes the motivations behind Apache Gobblin (incubating), architecture, latest innovations in supporting both batch and streaming data pipelines as well as future roadmap.
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/227622666/
Title: Spark on Kubernetes
Abstract: Engineers across several organizations are working on support for Kubernetes as a cluster scheduler backend within Spark. While designing this, we have encountered several challenges in translating Spark to use idiomatic Kubernetes constructs natively. This talk is about our high level design decisions and the current state of our work.
Speaker:
Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. His focus is on running stateful and batch workloads. Previously, he worked on GGC (Google Global Cache) and prior to that, on the infrastructure team at NVIDIA."
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
Apache Flink is a world class stateful stream processor presents a huge variety of optional features and configuration choices to the user. Determining out the optimal choice for any production environment and use-case be challenging. In this talk, we will explore and discuss the universe of Flink configuration with respect to state and state backends.
We will start with a closer look under the hood, at core data structures and algorithms, to build the foundation for understanding the impact of tuning parameters and the costs-benefit-tradeoffs that come with certain features and options. In particular, we will focus on state backend choices (Heap vs RocksDB), tuning checkpointing (incremental checkpoints, ...) and recovery (local recovery), serializers and Apache Flink's new state migration capabilities.
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
In order to leverage the best performance characters of your data or stream backend, it is important to understand the nitty gritty details of how your backend store and compute works, how data is stored, how is it indexed and how the read path is. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this slide deck, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
The focus will be more on storage backend so as to not keep this tailored to pulsar specifically but to be able to apply it different data stores or streams.
Productionizing Spark and the Spark Job ServerEvan Chan
You won't find this in many places - an overview of deploying, configuring, and running Apache Spark, including Mesos vs YARN vs Standalone clustering modes, useful config tuning parameters, and other tips from years of using Spark in production. Also, learn about the Spark Job Server and how it can help your organization deploy Spark as a RESTful service, track Spark jobs, and enable fast queries (including SQL!) of cached RDDs.
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
We will introduce HerdDB a distributed database written in Java.
We will see how a distributed database can be built using Apache BookKeeper as write-ahead commit log.
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
The driving question behind redesigns of countless data collection architectures has often been, ?how can we make the data available to our analytical systems faster?? Increasingly, the go-to solution for this data collection problem is Apache Flume. In this talk, architectures and techniques for designing a low-latency Flume-based data collection and delivery system to enable Hadoop-based analytics are explored. Techniques for getting the data into Flume, getting the data onto HDFS and HBase, and making the data available as quickly as possible are discussed. Best practices for scaling up collection, addressing de-duplication, and utilizing a combination streaming/batch model are described in the context of Flume and Hadoop ecosystem components.
Iphone client-server app with Rails backend (v3)Sujee Maniyam
Some of the lessons learned from building a client-server iphone app (DiscountsForMe)
This is version 3 of the talk, presented at SF Ruby Meetup on Feb 17, 2010
MongoDB's flexible schema makes it a great fit for your next content management application as its data model makes it easy to catalog multiple content types with diverse meta data. In this session, we'll review schema design for content management, using GridFS for storing binary files, and how you can leverage MongoDB's auto-sharding to partition your content across multiple servers.
Presented by Andrew Erlichson, Vice President, Engineering, Developer Experience, MongoDB
Audience level: Beginner
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB. You will learn:
- How to work with documents
- How to evolve your schema
- Common schema design patterns
Building Real Time Systems on MongoDB Using the Oplog at StripeMongoDB
MongoDB's oplog is possibly its most underrated feature. The oplog is vital as the basis on which replication is built, but its value doesn't stop there. Unlike the MySQL binlog, which is poorly documented and not directly exposed to MySQL clients, the oplog is a well-documented, structured format for changes that is query-able through the same mechanisms as your data. This allows many types of powerful, application-driven streaming or transformation. At Stripe, we've used the MongoDB oplog to create PostgresSQL, HBase, and ElasticSearch mirrors of our data. We've built a simple real-time trigger mechanism for detecting new data. And we've even used it to recover data. In this talk, we'll show you how we use the MongoDB oplog, and how you can build powerful reactive streaming data applications on top of it.
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...Vlad Savitsky
This presentation is about real life example of using MongoDB on our not specific project for Drupal which supports more than 25m pageviews per day, more than 500k registered users with page load time less than 1sec.
It will give an understanding how MongoDB can be easily used to increase performance of web-site.
My presentation will be as easy as possible with simple examples and schemas but it will require at least intermediate level of developers.
- project and tasks overview. http://freerice.com/ - quiz game site. There are a lot of dynamic info, users (registered and anonymous), groups and their different game statistis, user statuses etc.
- problems while using MySQL
- server optimization attempts. Memcache+Varnish. MySQL replication. Using game as separate script and AJAX blocks with "light" bootstrap.
- MongoDB overview and it's benefits on current project.
- PHP and MongoDB
- project's MongoDB architecture overview
- nodes and MongoDB
- users/groups, their statistics and MongoDB
- switching MySQL to MongoDB in Views.
- indexing problems and statistic calculations.
- multilingual support
- scalability and using MongoDB replica set.
- totals
Этот доклад о применении MongoDB в одном из наших реальных проектов, который на данный момент обслуживает более 25млн показов страниц в день, более 500тыс зарегистрированных пользователей с скоростью загрузки страниц менее 1сек.
Он позволит понять каким образом можно использовать MongoDB для увеличения производительности сайта.
Мой доклад будет на столько простым на сколько это возможно с несложными схемами и примерами, но он требует как минимум среднего уровня разработчиков для полного понимания.
- краткое описание проекта и поставленных задач. http://freerice.com/ - игра-викторина. много динамических данных, группы, игроки (зарегистрированные и анонимусы) и их статистики по разным параметрам, статусы игроков и т.д.
- возникшие проблемы с работой MySQL.
- попытки серверной оптимизации. Memcache+Varnish. Репликация MySQL. Перенос игры в отдельный скрипт и AJAX блоки с использованием "легкого" бутстрапа.
- краткое описание MongoDB и приимущества его применения в текущем проекте.
- PHP и MongoDB
- общее описание архитектуры MongoDB на проекте.
- работа с нодами в MongoDB.
- работа с юзерами/группами и их статистиками.
- переход с MySQL на MongoDB в Views.
- проблемы с индексами, пересчеты статистик.
- поддержка многоязычности.
- масштабируемость MongoDB. Использование реплики.
- итоги
Update: Social Harvest is going open source, see http://www.socialharvest.io for more information.
My MongoSV 2011 talk about implementing machine learning and other algorithms in MongoDB. With a little real-world example at the end about what Social Harvest is doing with MongoDB. For more updates about my research, check out my blog at www.shift8creative.com
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
MongoDB IoT CITY Tour EINDHOVEN: Bosch & Tech Mahindra: Industrial Internet, ...MongoDB
All of these concepts are promising to transform the current industrial landscape by leveraging the IoT. In this presentation, Bosch, TechMahindra and MongoDB will present a concrete example that goes from concept to implementation. Learn how advanced hand-held tightening tools, user ID cards, wireless indoor localisation technology, M2M asset management and big data can be combined to form a powerful track and trace solution for advanced manufacturing requirements.
Building Real Time Systems on MongoDB Using the Oplog at StripeStripe
MongoDB's oplog is possibly its most underrated feature. The oplog is vital as the basis on which replication is built, but its value doesn't stop there. Unlike the MySQL binlog, which is poorly documented and not directly exposed to MySQL clients, the oplog is a well-documented, structured format for changes that is query-able through the same mechanisms as your data. This allows many types of powerful, application-driven streaming or transformation. At Stripe, we've used the MongoDB oplog to create PostgresSQL, HBase, and ElasticSearch mirrors of our data. We've built a simple real-time trigger mechanism for detecting new data. And we've even used it to recover data. In this talk, we'll show you how we use the MongoDB oplog, and how you can build powerful reactive streaming data applications on top of it.
If you'd like to see the presentation with presenter's notes, I've published my Google Docs presentation at https://docs.google.com/presentation/d/19NcoFI9BG7PwLoBV7zvidjs2VLgQWeVVcUd7Xc7NoV0/pub
Originally given at MongoDB World 2014 in New York
MongoDB IoT City Tour LONDON: Industrial Internet, Industry 4.0, Smart Factor...MongoDB
Presented by, Deepak Maheshwari, Tech Mahindra
Industrial Internet, Smart Factory, Industry 4.0 – all of these concepts are promising to transform the current industrial landscape by leveraging the IoT. In this presentation, Bosch, TechMahindra and MongoDB will present a concrete example that goes from concept to implementation. Learn how advanced handheld tightening tools, user ID cards, wireless indoor localisation technology, M2M asset management and big data can be combined to form a powerful track and trace solution for advanced manufacturing requirements.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
View the video of this webinar here: https://www.arangodb.com/arangodb-events/gvisor-kata-containers-firecracker-docker/
Containers* have revolutionized the IT landscape and for a long time. Docker seemed to be the default whenever people were talking about containerization technologies**. But traditional container technologies might not be suitable if strong isolation guarantees are required. So recently new technologies such as gVisor, Kata Container, or firecracker have been introduced to close the gap between the strong isolation of virtual machines and the small resource footprint of containers.
In this talk, we will provide an overview of the different containerization technologies, discuss their tradeoffs, and provide guidance for different use cases.
* We will define the term container in more detailed during the talk
** and yes we will also cover some of the pre-docker container space!
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
MongoDB has become the prominent NoSQL database engine and is now used for a wide variety of use cases because of its flexibility and ease of use for developers, while Scylla, a C++ rewrite of Cassandra, provides benefits through its architectural approach, including getting rid of the JVM and a CPU-level design that gets the most out of your hardware thanks to a CPU level design.
Numberly has been using MongoDB for over a decade and Scylla for over a year in production. The benefits of the Scylla architecture allied to the Cassandra ecosystem fuel a rapid adoption in a very wide range of use cases: from real-time data pipelines and analytics batches processing to web applications database backend.
Learn the motivations of such an adoption trend and why it proves to be successful so far while outlining its limits and why MongoDB is still here to stay!
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
Trong quá trình phân tích hiệu năng, hiểu và nắm vững ngôn ngữ lập trình cũng như cách thiết kế của nó là rất hữu ích. Go là một trong những ngôn ngữ được sử dụng phổ biến trong các hệ thống phân tán có hiệu năng cao. Để hiểu rõ hơn cách mà Go compiler phân tích cách cấp phát bộ nhớ khi biên dịch chương trình, hãy nghe những chia sẻ của anh Cường về Escape Analysis trong Go compiler.
Về diễn giả:
Anh Lê Mạnh Cường là một kĩ sư phần mềm có 8 năm kinh nghiệm chuyên sâu trong backend và Quản trị hệ thống Linux. Là một OSS contributor tích cực, anh Cường đã có nhiều cống hiến vào cộng đồng mã nguồn mở, đặc biệt là Go và ecosystem của Go.
The use of containers to simplify and speed the deployment and development of applications is taking off. Most container usage is around stateless micro-services, but data and transactions are key components of most applications.
This presentation reviews:
- The purpose of containers and their usage
- How to containerize your EDB Postgres deployment
- How to deal with issues of managing your database and storage
- How to set up a cluster for high availability
- How to build a container with the EDB Postgres Enterprise Manager Agent in the container
Target Audience:
This technical presentation is for DBAs, Data Architects, Developers, DevOps, IT Operations and anyone responsible for supporting a Postgres interested in learning about Containers. It is equally suitable for organizations using community PostgreSQL as well as EDB’s Postgres Plus product family.
To listen to the recording which includes a demonstration, visit EnterpriseDB > Resources > Webcasts
There's plenty of material (documentation, blogs, books) out there that'll help
you write a site using Django... but then what? You've still got to test,
deploy, monitor, and tune the site; failure at deployment time means all your
beautiful code is for naught.
Web technologies are evolving blazingly fast and so it is AWS. Part of this evolution is GraphQL and the AWS team already took notice. In March 2019 AWS joined the GraphQL Foundation, double betting on the technology as an ingredient for great applications.
Designing GraphQL API's for scale on AWS is a challenging and exciting process, in this talk, we will talk about some key learnings from my past two years and how to overcome several challenges of this process.
Presentation from the 4th Athens Gophers Meetup.
At a glance we present:
- why we introduced a new language in the organization and why that
was Go
- how we approached the transition
- some of the projects we built in Go
- the challenges we faced and the lessons we learned in the process
Join this workshop and accelerate your journey to production-ready Kubernetes by learning the practical techniques for reliably operating your software lifecycle using the GitOps pattern. The Weaveworks team will be running a full-day workshop, sharing their expertise as users and contributors of Kubernetes and Prometheus, as well as followers of GitOps (operations by pull request) practices.
Using a combination of instructor led demonstrations and hands-on exercises, the workshop will enable the attendee to go into detail on the following topics:
• Developing and operating your Kubernetes microservices at scale
• DevOps best practices and the movement towards a “GitOps” approach
• Building with Kubernetes in production: caring for your apps, implementing CI/CD best practices, and utilizing the right metrics, monitoring tools, and automated alerts
• Operating Kubernetes in production: Upgrading and managing Kubernetes, managing incident response, and adhering to security best practices for Kubernetes
Similar to Constructing Web APIs with Rack, Sinatra and MongoDB (20)
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
5. a good API is...focussed
‣ clear in its intent
‣ epitomizes good coding/behavioural practice
‣ has minimal sugar
‣ has a minimum of control surfaces
6. a good API is...evolvable
‣ your API will have consumers
‣ you don’t suddenly break the consumers, ever
‣ you control the API lifecycle, you control the expectations
7. a good web API is...responsive
‣ unchatty
‣ bandwidth sensitive
‣ latency savvy
‣ does paging where appropriate
‣ not unnecessarily fine-grained
8. a good web API is...resilient
‣ stable in the presence of badness
‣ traps flooding/overload
‣ adapts to surges
‣ makes good on shoddy requests, if possible
‣ authenticates, if appropriate
9. example application
‣ flavour of the month - location tracker!
‣ now that apple/google no longer do our work for us
‣ register a handset
‣ add a location ‘ping’ signal from handset to server
https://github.com/oisin/plink
10. design (focussed)
‣ PUT a handset for registration
‣ POST location details
‣ DEL a handset when not in use
‣ focussed and short
11. design (evolvable)
‣ hit it with a hammer - put a version into URL - /api/v1.3/...
‣ in good company - google, twitter
‣ produce a compatibility statement
‣ what it means to minor/major level up
‣ enforce this in code
12. design (resilience)
‣ mongoDB for scaling
‣ write code to work around badness
‣ throttling of client activity with minimum call interval
‣ not using auth in this edition...
13. design (responsiveness)
‣ this API is very fine-grained, but not chatty
‣ we should queue to decouple POST response time from db
‣ but mongo is meant to be super-fast
‣ so maybe we get away with it in this edition :)
15. technologies (rack)
‣ rack - a ruby webserver interface
‣ we’re going to use this for two things
‣ throttling for bad clients using a Rack middleware
‣ mounting multiple Sinatra apps with Rack::Builder (later on)
http://rack.rubyforge.org/
16. technologies (mongodb)
‣ high performance
‣ non-relational
‣ horizontal scaling
‣ may give us resilience and
responsiveness
‣ also nice client on MacOS :)
http://www.mongodb.org http://mongohub.todayclose.com/
17. technologies (mongo_mapper)
‣ ORM for mongoDB
‣ a slight tincture of ActiveRecord : models, associations, dynamic
finders
‣ embedded documents
‣ indices
‣ also, I like DataMapper and this is a little similar
http://mongomapper.com/
19. mongoDB is document-oriented
‣ collections contain documents, which can contain keys, arrays and
other documents
‣ a document is like a JSON dictionary (in fact, it’s BSON)
‣ indices, yes, but no schema in the RDBMS sense - but you do plan!
20. mongoDB is a database
‣ foreign keys - can reference documents living in other collections
‣ indices - same as RDBMS - use in the same way
‣ datatypes - JSON basics plus some others including regex and code
‣ flexible querying with js, regex, kv matching
‣ but no JOINs
all the same query
21. mongoDB can scale
‣ by relaxing some of the constraints of relational DBs, better horizontal
scaling can be achieved
‣ replica sets for scaling reads
‣ replica sets & sharding for scaling writes
‣ map/reduce for batch processing of data (like GROUP BY)
http://www.mongodb.org/display/DOCS/Replication
http://www.mongodb.org/display/DOCS/Sharding
22. cap/brewer’s theorem
All nodes see all data
at the same time
Consistency
Partition
Availability Tolerance
Node failures do not Only total network failure
prevent operation will cause system to respond incorrectly
Pick Any Two
23. consistency model (read)
master
slave
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
24. mongoDB is performance oriented
‣ removes features that impede performance
‣ will not replace your SQL store
‣ good for this example app - because we want fast ‘write’
performance and scale (consistency not so much)
‣ GridFS - chunkifies and stores your files - neat!
32. mongo (capped collections)
‣ Fixed size, high performance LRU
‣ Maintains insertion order - great for logs/comments/etc
‣ not in use in this example application
‣ embedded documents - no cap on arrays
‣ putting location data in another collection - not sensible
‣ hacked it in the example app
40. wraps (mongo)
‣ programming is straightforward with mongo_mapper
‣ works well with heroku
‣ haven’t done any work with sharding/replication
‣ complement RDBMS - e.g. for GridFS files storage, logs, profiles
‣ worthy of further study and experimentation
41. improvements (example)
‣ authentication using Rack::Warden
‣ queued invocations using delayed_job
‣ some eye candy for the tracking data
‣ suggestions welcome :-)
http://github.com/oisin/plink
Editor's Notes
In which Oisín talks about the motivation for a web API; what makes an API Good, Right and True; an exemplary application; \nsome useful technologies to achieve the application goals; the great mongo; the cap theorem and consistency; \nprogramming mongo through mongomapper; defensive coding for the web API; deployment to Heroku and CloudFoundry;\nand summarizes some realizations about mongo.\n
Developers Developers Developers -- a web API gives you a chance to build an\necosystem of developers and products and business based on your stuff.\n
Chances are if you are writing an app, you’ll need a server side component to hold\ndata, perform queries and share things. You’ll do this with a Web API.\n
Shock - some people are actually making money from web APIs - based on a freemium\nmodel, companies like UrbanAirship charge for pushing data to phones; other data\ncompanies charge subscription access to their data corpora. Next: What makes a good API?\n
APIs can be a bit difficult to get right. So let’s look at the characteristics\nof a good API. Clarity - includes the documentation here. Good practice -\nadhere to naming conventions; no 40 parameter methods; Sugar implies\nno sugar also possible, reduces clarity. Minimum - behavioural hints in \none place, minimal methods. But this all is tempered by reality. \n
A thing that is very important for the longevity (and usefulness) of an API is evolvability. APIs have a lifecycle - you release them into the wild and people start using them. They use them in ways you never, ever, would have thought. And they start looking for new approaches, methods, access to internals and new ways to control the behaviour. If they are paying you, it’s usually a good idea in some instances to give them what they need. But you have to do this in a controlled fashion. If you break products that customers are using to make money, then there will be hell to pay. So it’s important you control the lifecycle of the API and the experience of everybody. You need to be able to say we are making changes, and we’re going to change the version, and this is what that means.\n
Previous characteristics apply to programming APIs, but web APIs have some extra fun things associated with them because they have the network in there, and everybody knows how that makes life difficult. Don’t try to do many fine-grained calls; make sure a typical interaction with the API doesn’t take many calls; but be bandwidth sensitive as well as latency savvy; use paging, with ranges, or iterator style URLs. \n
This is the thing that will annoy people the most - if your API goes away totally. It may degrade, get slower, but shouldn’t go away. A lot of the resilience here is ops-based, so you need the right kind of scaling, but that doesn’t absolve you from doing some programming work! That’s the theory. \n
I did a little sample application, which I’d like to keep developing, as there is some interesting stuff from the point of view of scaling and using mongo that I’d like to get into at some point.\n
From the design perspective - it’s focussed - only does three things!\n
Ok to hit this with a hammer, not to be subtle and encode a version number in the URL. We can enforce compatibility rules in the code itself. A little later we can see how something like Rack can help us with this even more so, but we should keep checks in the code. Compatibility statement is something you have in the docs for your developers. But you know how that works already.\n
I admit I’m taking a few shortcuts here! Mongo is going to do the scaling for us :) We’re going to write some defensive code. One call per 5 minutes is probably plenty for me to find out what’s going on in terms of the handset location. I left out auth to just take off one layer of stuff - it should be in later versions of the example application.\n
Very small API - fine-grained is ok here. We should use queues to ensure that the synchronous HTTP returns as quickly as possible to the client. This needs an experiment - I’m playing it by ear here - mongo is meant to be fast, so maybe putting in something like a delayed_job may actually mean more overhead. This is a kind of design decision where you need to get some figures and some costs. Now lets look at some of the technologies I’ve put together for this sample app.\n
Sinatra is my go-to guy for small web application and web apis. Zero hassle and easy to work with and rackness gives it loads of middlewares I can use to modify the request path.\n
This gives you a stack/interceptor model to run what’s called middlewares before it gets to your Sinatra application. You can also use it to start up and mount multiple applications living off the same root URL, but in different branches - I’ve added a separate tracking application which is meant to show the data gathered, which we’ll see later.\n
Mongo! Why did I choose it for this - high performance, horizontal scaling, non-relational, and these are all things I wanted to look at (but not so much in this talk!) It might also save my ass on the resilience and responsiveness I was talking about earlier!\n
There’s a good Ruby driver for Mongo from 10gen, but MongoMapper gives me an ORM, which is nice and lives on top of that driver. It’s a little ActiveRecord-like, with models, associations etc. At this point, it’s probably time to say a little about MongoDB.\n
There are a few companies using it! Lots of data. You can get all of this information from http://ww.mongodb.com/ and there are a number of really good experience blog entries and articles that are linked. Worth a read.\n
Well, what’s a document anyway? The main choice you need to make with Mongo is whether or not you want something to be an embedded document or a DBRef to a document on in another collection. \n
Embedded documents instead of joins - the efficiency being that when you pull the document, you get all the embedded ones with it and you don’t need to go back to perform a JOIN.\n
Horizontal scale and performance are the main goal of Mongo - the way to get this was to come back to some of the features and assumptions of the RDBMS and remove them: transactions, JOINs. Take these out, or soften the requirement, and the goals are more easily achieved.\n\nReplica sets involve a master and one or more slaves - you write to the master and this is pushed out to the slaves. It’s an eventual consistency model, so if you write, then immediately read from the slave, you will see stale data. If this works for you, then cool. This will scale reads.Sharding is about partitioning your collections over many replica sets. Multiple masters then means that you can scale your writes. Sharding just can be turned on at no downtime. But I haven’t tried this yet - the next talk maybe!\n\nmap/reduce is an approach for processing huge datasets on certain kinds of distributable problems using a large number of computers\nMap: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes.The worker node processes that smaller problem, and passes the answer back to its master node. Reduce: The master node then takes the answers to all the sub-problems and combines them in some way to get the output — the answer to the problem it was originally trying to solve.\n\n
Any mention of Mongo or any NoSQL database has to mention the CAP Theorem. This is all distributed system academic stuff, but important.\n\nLots of links here - this was a conjecture by Brewer in 2000 that in a distributed system, you can have C, A, or P, but not all three. This was proved to be true in a paper in 2002 - check the links below. These features are all subtly linked and interdependent. \n\n\nExamples - BigTable is CA, Dynamo is AP\n\n\nhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf\nhttp://www.julianbrowne.com/article/viewer/brewers-cap-theorem\nhttp://highscalability.com/amazon-architecture\nhttp://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf\nhttp://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext\nhttp://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1\nhttp://blog.dhananjaynene.com/2009/10/nosql-a-fluid-architecture-in-transition/\nhttp://devblog.streamy.com/tag/partition-tolerance/\n
Here’s where MongoDB sits in terms of read consistency wrt Dynamo/SimpleDB.\n
\n
1, 2, 3) Sinatra API \n\n4) Application is started by Rack::Builder\n
1) This the regex that will match the root of the URL path_info for a versioned call\n2) The compatibility statement is implemented by this helper\n3) This filter occurs before every API call and checks the version expected by the incoming request is version compatible with the server’s own\n
1) This is a Mongo document\n2) Declare the keys in the document, their type and say they are mandatory\n3) This is an association - the Handset document should connect to many Location documents\n4) This is an Mongo Embedded Document - it lives inside another document, not in its own collection\n5) The :time key is protected from mass assignment\n
1) Making a new connection to the database and setting the database name -- this will be very different when you are using a hosted Mongo, like the MongoHQ that’s used by Heroku. Check out the app code on GitHub for details.\n2) Telling Mongo to make sure that the handsets collection (which is modeled by Handset) should be indexed on the :code key\n\nDriver too: http://api.mongodb.org/ruby/current/file.TUTORIAL.html\nMongoMapper: http://mongomapper.com/documentation/\n
1) Starting the Mongo shell client and using the appropriate database\n2) Querying for all the handsets\n3) One of the handsets has an embedded document Location\n
1) Standard MongoMapper ‘where’ query\n2) Creating a Handset and setting the :status and :code keys\n3) Dynamic finder, ActiveRecord stylee\n4) Deleting a document in the handsets collection\n
1) Making a new Location model instance, but not saving it to databas\n2) Defense Against the Dark Arts: checking for mandatory JSON payload keys\n3) Defense Against the Dark Arts: checking for optional JSON payload keys\n4) Adding a Location to an array of them in the Handset model\n5) Saving the Handset model will write the Location array as embedded documents\n
Unfortunately can’t mix up those capped collections with location information here - it wouldn’t make sense to have the locations into a separate collection - there would be one for each handset and we’re limited on the number of collections on Mongo.\n\nIssues with document size - a single doc can be something like 16MB, including all of the\nembedded documents. Mongo is good for storing LOTS of documents, not HUGE documents.\nHence the dumb hack in the code.\n
1) Only in production, use Throttler middleware, and program for a 300 second (5 min) interval\n2) Extend the Rack Throttle interval throttler\n3) Just work the choke on URLs that have ‘plink’ at the end - we don’t want to throttle everything!\n\nThrottlees get a 403 if they try to get another plink in within a 5 minute limit.\n
EASY!\n
NOT EASY!\n
1) Grab all the handsets from the database\n2) Send /track tree off to the Track application - guess how this can help with versioning :)\n
\n
This is my takeaways from this experiment with mongoDB\n
Improvements that could be made to the example application (hint hint).\n