csv, one of the standard libraries, in Ruby 2.6 has many improvements:
* Default gemified
* Faster CSV parsing
* Faster CSV writing
* Clean new CSV parser implementation for further improvements
* Reconstructed test suites for further improvements
* Benchmark suites for further performance improvements
These improvements are done without breaking backward compatibility.
This talk describes details of these improvements by a new csv maintainer.
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Ontico
Существует множество архитектур и способов масштабирования систем. Сегодня многие компании мигрируют в облачные сервисы или используют контейнеры. Но действительно ли это так необходимо и нужно ли следовать трендам?
В данном докладе мне бы хотелось рассказать об архитектуре, которую я спланировал и внедрил в компании InnoGames. Архитектура, не требующая вмешательства администратора в случае лавинообразного увеличения нагрузки и, что ещё более важно, умеющая редуцироваться в случае отсутствия её для экономии затрат.
Вы узнаете об опыте создания сервиса с очень непростыми критериями и поймёте, что не обязательно платить в 3 раза дороже за AWS или любую подобную систему.
- Что такое CRM. Зачем нам этот сервис.
- Инфраструктура.
-- Graphite. Почему он должен быть надежным и быстрым.
-- Puppet + gitlab.
-- Балансировка нагрузки.
-- Наше облако. Зачем нам openstack, когда есть serveradmin!? Как роль сервера определяется несколькими атрибутами в веб-интерфейсе.
-- Nagios + аггрегаторы. Другой взгляд на то, как мониторить сервисы через Graphite.
-- Мониторинг кластеров. Clusterhc и Grafsy.
-- Brassmonkey. Как мы написали своего сисадмина на python.
-- Бэкапы.
- Архитектура CRM3.
- Autoscaling или как проанализировать кучу данных и принять решения.
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Ontico
Существует множество архитектур и способов масштабирования систем. Сегодня многие компании мигрируют в облачные сервисы или используют контейнеры. Но действительно ли это так необходимо и нужно ли следовать трендам?
В данном докладе мне бы хотелось рассказать об архитектуре, которую я спланировал и внедрил в компании InnoGames. Архитектура, не требующая вмешательства администратора в случае лавинообразного увеличения нагрузки и, что ещё более важно, умеющая редуцироваться в случае отсутствия её для экономии затрат.
Вы узнаете об опыте создания сервиса с очень непростыми критериями и поймёте, что не обязательно платить в 3 раза дороже за AWS или любую подобную систему.
- Что такое CRM. Зачем нам этот сервис.
- Инфраструктура.
-- Graphite. Почему он должен быть надежным и быстрым.
-- Puppet + gitlab.
-- Балансировка нагрузки.
-- Наше облако. Зачем нам openstack, когда есть serveradmin!? Как роль сервера определяется несколькими атрибутами в веб-интерфейсе.
-- Nagios + аггрегаторы. Другой взгляд на то, как мониторить сервисы через Graphite.
-- Мониторинг кластеров. Clusterhc и Grafsy.
-- Brassmonkey. Как мы написали своего сисадмина на python.
-- Бэкапы.
- Архитектура CRM3.
- Autoscaling или как проанализировать кучу данных и принять решения.
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Grant McAlister
Presentation from Postgres Open 2016 in Dallas (Sept 2016) - Covers new RDS features introduced over the last year and lessons learned operating a large fleet of PostgreSQL.
Boyan Ivanov - latency, the #1 metric of your cloudShapeBlue
No two clouds are the same. Yet the leading clouds all have one thing in common: they deliver on metrics, which matter to the customer. In this session we'll dissect leading clouds, to show why low latency is the thing that makes a cloud stand out.
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2954.html
MySQL Replication is powerful and has added a lot of advanced features through the years. In this presentation we will look into replication technology in MySQL 5.7 and variants focusing on advanced features, what do they mean, when to use them and when not, Including.
When should you use STATEMENT, ROW or MIXED binary log format?
What is GTID in MySQL and MariaDB and why do you want to use them?
What is semi-sync replication and how is it different from lossless semi-sync?
...
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
This presentation explains how to deploy and use the Integrated Caching feature on Netscaler. I gave this presentation to Citrix staff, customers and partners in worldwide in 2011. The presentation covers best practices and gotchas :) Integrated Caching is an excellent feature that can greatly improve the performance of your website.
Les architectures message-driven sont pensées afin d’offrir une possibilité de gérer un grand nombre d’événements. Si la volumétrie de ces éléments devient suffisante, il sera probablement judicieux de stocker le tout dans un cluster HDFS.
Nous présenterons un retour d’expérience d’ingestion d’événements depuis un broker Kafka vers un cluster HDFS.
Par Sylvain Lecqueux, consultant chez Xebia
A talk I gave at the Boston Web Performance Meetup in August 2014.
Performance is one of the most challenging issues in modern web app design, in large part because modeling, testing, and validating performance before deploying to production is so challenging. While many ops teams have nailed down the problem of re-creating pre-production environments that closely mimic production, those environments frequently rely on known-good components beyond the application code itself: AWS ELB, F5 load balancers, CDNs, Varnish, and more.
Testing plug-in components like that can be challenging, because their performance characteristics don't directly align with application metrics.
- How many simultaneous users can my load balancer support? - What sort of network load will I put on my CDN (i.e., how much will it cost?) - How do different user behavior patterns affect performance?
In this meetup, we'll introduce a novel tool in this toolbox: tcpreplay, an open-source tool for replaying packet capture files back at an application. By replaying user traffic to a staging environment, you can test the effects of
- Network saturation to the load balancer - High numbers of users / IPs - Lots of traffic to your other monitoring tools!
Blog with clickable links: http://www.pubnub.com/blog/real-time-stats-for-candy-box/
We like games here at PubNub, but not as much as we like real-time. Combine the two, and you’ve got pure mega-awesome. During the PubNub Hackathon, I took a popular text adventure game called Candy Box, and updated its stats page to provide a real-time overview of the global game statistics.
Take a look at what Rails 5 has in store for you. We go through all the new features and improvements across development, testing, caching and much more. So let's dive in.
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Grant McAlister
Presentation from Postgres Open 2016 in Dallas (Sept 2016) - Covers new RDS features introduced over the last year and lessons learned operating a large fleet of PostgreSQL.
Boyan Ivanov - latency, the #1 metric of your cloudShapeBlue
No two clouds are the same. Yet the leading clouds all have one thing in common: they deliver on metrics, which matter to the customer. In this session we'll dissect leading clouds, to show why low latency is the thing that makes a cloud stand out.
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2954.html
MySQL Replication is powerful and has added a lot of advanced features through the years. In this presentation we will look into replication technology in MySQL 5.7 and variants focusing on advanced features, what do they mean, when to use them and when not, Including.
When should you use STATEMENT, ROW or MIXED binary log format?
What is GTID in MySQL and MariaDB and why do you want to use them?
What is semi-sync replication and how is it different from lossless semi-sync?
...
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
This presentation explains how to deploy and use the Integrated Caching feature on Netscaler. I gave this presentation to Citrix staff, customers and partners in worldwide in 2011. The presentation covers best practices and gotchas :) Integrated Caching is an excellent feature that can greatly improve the performance of your website.
Les architectures message-driven sont pensées afin d’offrir une possibilité de gérer un grand nombre d’événements. Si la volumétrie de ces éléments devient suffisante, il sera probablement judicieux de stocker le tout dans un cluster HDFS.
Nous présenterons un retour d’expérience d’ingestion d’événements depuis un broker Kafka vers un cluster HDFS.
Par Sylvain Lecqueux, consultant chez Xebia
A talk I gave at the Boston Web Performance Meetup in August 2014.
Performance is one of the most challenging issues in modern web app design, in large part because modeling, testing, and validating performance before deploying to production is so challenging. While many ops teams have nailed down the problem of re-creating pre-production environments that closely mimic production, those environments frequently rely on known-good components beyond the application code itself: AWS ELB, F5 load balancers, CDNs, Varnish, and more.
Testing plug-in components like that can be challenging, because their performance characteristics don't directly align with application metrics.
- How many simultaneous users can my load balancer support? - What sort of network load will I put on my CDN (i.e., how much will it cost?) - How do different user behavior patterns affect performance?
In this meetup, we'll introduce a novel tool in this toolbox: tcpreplay, an open-source tool for replaying packet capture files back at an application. By replaying user traffic to a staging environment, you can test the effects of
- Network saturation to the load balancer - High numbers of users / IPs - Lots of traffic to your other monitoring tools!
Blog with clickable links: http://www.pubnub.com/blog/real-time-stats-for-candy-box/
We like games here at PubNub, but not as much as we like real-time. Combine the two, and you’ve got pure mega-awesome. During the PubNub Hackathon, I took a popular text adventure game called Candy Box, and updated its stats page to provide a real-time overview of the global game statistics.
Take a look at what Rails 5 has in store for you. We go through all the new features and improvements across development, testing, caching and much more. So let's dive in.
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowKouhei Sutou
I introduced Ruby and Apache Arrow integration including the "super fast large data interchange and processing" Apache Arrow feature at RubyKaigi Takeout 2021.
This talk introduces how we can use the "super fast large data interchange and processing" Apache Arrow feature in Ruby. Here are some use cases:
* Fast data retrieval (fast pluck) from DB such as MySQL and PostgreSQL for batch processes in a Ruby on Rails application
* Fast data interchange with JavaScript for dynamic visualization in a Ruby on Rails application
* Fast OLAP with in-process DB such as DuckDB and Apache Arrow DataFusion in a Ruby on Rails application or irb session
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
We created a plugin-based data collection tool that can read any chaotically formatted files called "CSV" by guessing its schema automatically
Talked at csv,conf,v2 in Berlin
http://csvconf.com/
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardSV Ruby on Rails Meetup
Wesley Beary: Cloud computing scared the crap out of me - the quirks and nightmares
of provisioning computing and storage on AWS, Terremark, Rackspace,
etc - until I took the bull by the horns. Let me now show you how I
tamed that bull.
Learn how to easily get started cloud computing with fog. It gives you
the reins within any Ruby application or script. If you can control
your infrastructure choices, you can make better choices in
development and get what you need in production.
You'll get an overview of fog and concrete examples to give you a head
start on your own provisioning workflow.
Detail behind the Apache Cassandra 2.0 release and what is new in it including Lightweight Transactions (compare and swap) Eager retries, Improved compaction, Triggers (experimental) and more!
• CQL cursors
5 Takeaways from Migrating a Library to Scala 3 - Scala LoveNatan Silnitsky
Scala 3 is going to make Scala more easy to write, and especially read. more power features like enums and less misleading keywords like implicit.
But first we need to migrate our old Scala 2.12 / 2.13 codebase to Scala 3.
This talk tells the story of how I tried to migrate greyhound open-source library to Scala 3 with partial success.
You will hear about what works, what doesn't and about a few pitfalls to avoid
Migration takeaways include:
1. use migration tools, don't do it manually
2. which popular 3rd party libraries can and can't be used by Scala 3 code
and many more
Scripting Embulk plugins makes plugin development easier drastically. You can develop, test, and productionize data integrations using any scripting languages. It's most suitable way to integrate data with SaaS using vendor-provided SDKs.
https://techplay.jp/event/781988
fog or: How I Learned to Stop Worrying and Love the CloudWesley Beary
Learn how to easily get started on cloud computing with fog. If you can control your infrastructure choices, you’ll make better choices in development and get what you need in production. You'll get an overview of fog and concrete examples to give you a head start on your provisioning workflow.
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)Wesley Beary
Cloud computing scared the crap out of me - the quirks and nightmares of provisioning cloud computing, dns, storage, ... on AWS, Terremark, Rackspace, ... - I mean, where do you even start?
Since I couldn't find a good answer, I undertook the (probably insane) task of creating one. fog gives you a place to start by creating abstractions that work across many different providers, greatly reducing the barrier to entry (and the cost of switching later). The abstractions are built on top of solid wrappers for each api. So if the high level stuff doesn't cut it you can dig in and get the job done. On top of that, mocks are available to simulate what clouds will do for development and testing (saving you time and money).
You'll get a whirlwind tour of basic through advanced as we create the building blocks of a highly distributed (multi-cloud) system with some simple Ruby scripts that work nearly verbatim from provider to provider. Get your feet wet working with cloud resources or just make it easier on yourself as your usage gets more complex, either way fog makes it easy to get what you need from the cloud.
The OpenStack Edition adds my concerns about OpenStack API development, including things that have already been fixed and things that we haven't yet encountered. Hopefully this consumer perspective can help shed light on some rough spots.
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowKouhei Sutou
To use Ruby for data processing widely, Apache Arrow support is important. We can do the followings with Apache Arrow:
* Super fast large data interchange and processing
* Reading/writing data in several famous formats such as CSV and Apache Parquet
* Reading/writing partitioned large data on cloud storage such as Amazon S3
This talk describes the followings:
* What is Apache Arrow
* How to use Apache Arrow with Ruby
* How to integrate with Ruby 3.0 features such as MemoryView and Ractor
Apache Arrow 1.0 - A cross-language development platform for in-memory dataKouhei Sutou
Apache Arrow is a cross-language development platform for in-memory data. You can use Apache Arrow to process large data effectively in Python and other languages such as R. Apache Arrow is the future of data processing. Apache Arrow 1.0, the first major version, was released at 2020-07-24. It's a good time to know Apache Arrow and start using it.
Apache Arrow - A cross-language development platform for in-memory dataKouhei Sutou
Apache Arrow is the future for data processing systems. This talk describes how to solve data sharing overhead in data processing system such as Spark and PySpark. This talk also describes how to accelerate computation against your large data by Apache Arrow.
PGroonga 2 – Make PostgreSQL rich full text search system backend!Kouhei Sutou
PGroonga 2.0 has been released with 2 years development since PGroonga 1.0.0. PGroonga 1.0.0 just provides fast full text search with all languages support. It's important because it's a lacked feature in PostgreSQL. PGroonga 2.0 provides more useful features to implement rich full text search system with PostgreSQL. This session shows how to implement rich full text search system with PostgreSQL!
This talk describes about PGroonga that resolves these problems.
PostgreSQL Conference Japan 2017の資料です。
PostgreSQLが苦手な全文検索機能を強力に補完する拡張機能PGroongaの最新情報を紹介します。PGroongaは単なる全文検索モジュールではありません。PostgreSQLで簡単にリッチな全文検索システムを実現するための機能一式を提供します。PGroongaは最新PostgreSQLへの対応も積極的です。例としてロジカルレプリケーションと組み合わせた活用方法も紹介します。
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
4. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Kouhei Sutou
The president of ClearCode Inc.
クリアコードの社長
✓
A new maintainer of the csv library
csvライブラリーの新メンテナー
✓
The founder of Red Data Tools project
Red Data Toolsプロジェクトの立ち上げ人
Provides data processing tools for Ruby
Ruby用のデータ処理ツールを提供するプロジェクト
✓
✓
5. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Kazuma Furuhashi
A member of Asakusa.rb / Red Data Tools
Asakusa.rb/Red Data Toolsメンバー
✓
Worikng at Speee Inc.
Speeeで働いている
✓
16. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
CSV.generate_line vs. CSV#<<
2.5 2.6
generate_
line
284.4i/s 684.2i/s
<< 2891.4i/s 4824.1i/s
Use << for multiple writes
複数行書き出すときは<<を使うこと
17. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.6 (3)
Ruby 2.6のcsv(3)
New CSV parser
for
further improvements
さらなる改良のための新しいCSVパーサー
18. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark with KEN_ALL.CSV
KEN_ALL.CSVでのベンチマーク
01101,"060 ","0600000","ホッカイドウ","サッポロシチュウオウク",...
...(124257 lines)...
47382,"90718","9071801","オキナワケン","ヤエヤマグンヨナグニチョウ",...
Zip code data in Japan
日本の郵便番号データ
https://www.post.japanpost.jp/zipcode/download.html
19. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
KEN_ALL.CSV statistics
KEN_ALL.CSVの統計情報
Size(サイズ) 11.7MiB
# of columns(列数) 15
# of rows(行数) 124259
Encoding(エンコーディング) CP932
20. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Parsing KEN_ALL.CSV
KEN_ALL.CSVのパース
CSV.foreach("KEN_ALL.CSV",
"r:cp932") do |row|
end
2.5 2.6 Faster?
1.17s 0.79s 1.48x
21. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Fastest parsing in pure Ruby
Ruby実装での最速のパース方法
input.each_line(chomp: true) do |line|
line.split(",", -1) do |column|
end
end
Limitation: No quote
制限:クォートがないこと
22. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
KEN_ALL.CSV without quote
クォートなしのKEN_ALL.CSV
01101,060 ,0600000,ホッカイドウ,サッポロシチュウオウク,...
...(124257 lines)...
47382,90718,9071801,オキナワケン,ヤエヤマグンヨナグニチョウ,...
23. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Optimized no quote CSV parsing
最適化したクォートなしCSVのパース方法
CSV.foreach("KEN_ALL_NO_QUOTE.CSV",
"r:cp932",
quote_char: nil) {|row|}
split 2.6 Faster?
0.32s 0.37s 0.86x
(almost the same/同等)
24. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Summary: Performance
まとめ:性能
Parsing: 1.5x-3x faster
パース:1.5x-3x高速
Max to the "split" level by using an option
オプションを指定すると最大で「split」レベルまで高速化可能
✓
✓
Writing: 1.5x-2.5x faster
書き出し:1.5x-2.5x高速
Use CSV#<< than CSV.generate_line
CSV.generate_lineよりもCSV#<<を使うこと
✓
✓
25. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (1)
速度改善方法(1)
Complex quote
複雑なクォート
27. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Use StringScanner
StringScannerを使う
String#split is very fast
String#splitは高速
✓
String#split is naive for complex quote
String#splitは複雑なクォートを処理するには単純過ぎる
✓
28. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
2.5 uses String#split
in_extended_column = false # "...n..." case
@input.each_line do |line|
line.split(",", -1).each do |part|
if in_extended_column
# ...
elsif part.start_with?('"')
if part.end_with?('"')
row << pars.gsub('""', '"') # "...""..." case
else
in_extended_column = true
end
# ...
29. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
split: Complex quote
Just quoted 274.1i/s
Include sep1 211.0i/s
Include sep2 118.0i/s
Slow down
遅くなる
30. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
2.6 uses StringScanner
row = []
until @scanner.eos?
value = parse_column_value
if @scanner.scan(/,/)
row << value
elsif @scanner.scan(/n/)
row << value
yield(row)
row = []
end
end
31. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
parse_column_value
def parse_column_value
parse_unquoted_column_value ||
parse_quoted_column_value
end
Compositable components
部品を組み合わせられる
32. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
parse_unquoted_column_value
def parse_unquoted_column_value
@scanner.scan(/[^,"rn]+/)
end
33. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
parse_quoted_column_value
def parse_quoted_column_value
# Not quoted
return nil unless @scanner.scan(/"/)
# Quoted case ...
end
34. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Parse methods can be composited
パースメソッドを組み合わせられる
def parse_column_value
parse_unquoted_column_value ||
parse_quoted_column_value
end
Easy to maintain
メンテナンスしやすい
35. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (1)
ポイント(1)
Use StringScanner for complex case
複雑なケースにはStringScannerを使う
✓
StringScanner for complex case:
複雑なケースにStringScannerを使うと:
Easy to maintain
メンテナンスしやすい
✓
No performance regression
性能が劣化しない
✓
✓
36. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
StringScanner: Complex quote
Just quoted 554.5i/s
Include sep1 330.0i/s
Include sep2 325.6i/s
No slow down...?
遅くなっていない。。。?
37. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (2)
速度改善方法(2)
Simple case
単純なケース
39. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Use String#split
String#splitを使う
StringScanner is
slow
for simple case
StringScannerは単純なケースでは遅い
40. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Fallback to StringScanner impl.
StringScanner実装にフォールバック
def parse_by_strip(&block)
@input.each_line do |line|
if complex?(line)
return parse_by_string_scanner(&block)
else
yield(line.split(","))
end
end
end
41. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Quoted CSVs
クォートありのCSV
StringScanner split +
StringScanner
Just quoted 311.7i/s 523.4i/s
Include sep1 312.9i/s 309.8i/s
Include sep2 311.3i/s 312.6i/s
42. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (2)
ポイント(2)
First try optimized version
まず最適化バージョンを試す
✓
Fallback to robust version
when complexity is detected
複雑だとわかったらちゃんとしたバージョンにフォールバック
✓
43. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (3)
速度改善方法(3)
loop do
↓
while true
44. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
loop vs. while
How Throughput
loop 377i/s
while 401i/s
45. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (3)
ポイント(3)
while doesn't create a new scope
whileは新しいスコープを作らない
✓
Normally, you can use loop
ふつうはloopでよい
Normally, loop isn't a bottle neck
ふつうはloopがボトルネックにはならない
✓
✓
46. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to improve performance (4)
速度改善方法(4)
Lazy
遅延
47. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
CSV object is parser and writer
CSVオブジェクトは読み書きできる
2.5: Always initializes everything
2.5:常にすべてを初期化
✓
2.6: Initializes when it's needed
2.6:必要になったら初期化
✓
49. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to initialize lazily
初期化を遅延する方法
def parser
@parser ||= Parser.new(...)
end
def writer
@writer ||= Writer.new(...)
end
50. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Point (4)
ポイント(4)
Do only needed things
必要なことだけする
✓
One class for one feature
機能ごとにクラスを分ける
✓
51. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
New features by new parser
新しいパーサーによる新機能
Add support for " escape
"でのエスケープをサポート
✓
Add strip: option
strip:オプションを追加
✓
55. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to keep backward compat.
互換性を維持する方法
Reconstruct test suites
テストを整理
✓
Add benchmark suites
ベンチマークを追加
✓
56. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Test
テスト
Important to detect incompat.
非互換の検出のために重要
✓
Must be easy to maintain
メンテナンスしやすくしておくべき
To keep developing
継続的な開発するため
✓
✓
57. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Easy to maintenance
メンテナンスしやすい状態
Easy to understand each test
各テストを理解しやすい
✓
Easy to run each test
各テストを個別に実行しやすい
Focusing a failed case is easy to debug
失敗したケースに集中できるとデバッグしやすい
✓
✓
58. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark
ベンチマーク
Important to detect
performance regression bugs
性能劣化バグを検出するために重要
✓
59. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
benchmark_driver gem
Fully-featured
benchmark driver
for Ruby 3x3
Ruby 3x3のために必要な機能が揃っているベンチマークツール
60. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
benchmark_driver gem in csv
YAML input is easy to use
YAMLによる入力が便利
✓
Can compare multiple gem versions
複数のgemのバージョンで比較可能
To detect performance regression
性能劣化を検出するため
✓
✓
61. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark for each gem version
gemのバージョン毎のベンチマーク
63. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Benchmark as a starting point
出発点としてのベンチマーク
Join csv developing!
csvの開発に参加しよう!
✓
Adding a new benchmark is a good start
ベンチマークの追加から始めるのはどう?
We'll focus on improving performance for
benchmark cases
ベンチマークが整備されているケースの性能改善に注力するよ
✓
✓
64. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to use improved csv?
改良されたcsvを使う方法
gem install csv
65. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
csv in Ruby 2.5
Ruby 2.5のcsv
Default gemified
デフォルトgem化
66. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Default gem
デフォルトgem
Can use it just by require
requireするだけで使える
✓
Can use it without entry in Gemfile
(But you use it bundled in your Ruby)
Gemfileに書かなくても使えるけど古い
✓
Can upgrade it by gem
gemでアップグレードできる
✓
67. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
How to use improved csv?
改良されたcsvを使う方法
gem install csv
78. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Further work
今後の改善案
Improve transcoding performance of Ruby
Rubyのエンコーディング変換処理の高速化
✓
Improve simple case parse performance
by implementing parser in C
シンプルなケース用のパーサーをCで実装して高速化
✓
Improve perf. of REXML as well as csv
csvのようにREXMLも高速化
✓
79. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Join us!
一緒に開発しようぜ!
Red Data Tools:
https://red-data-tools.github.io/
✓
RubyData Workshop: Today 14:20-15:30✓
Code Party: Today 19:00-21:00✓
After Hack: Sun. 10:30-17:30✓
80. Better CSV processingwith Ruby 2.6 Powered by Rabbit 3.0.0
Join us!!
一緒に開発しようぜ!!
OSS Gate: https://oss-gate.github.io/
provides a "gate" to join OSS development
OSSの開発に参加する「入り口」を提供する取り組み
✓
Both ClearCode and Speee are one of sponsors
クリアコードもSpeeeもスポンサー
✓
✓
OSS Gate Fukuoka:
https://oss-gate-fukuoka.connpass.com/
✓