At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
Design patterns là một đề tài "khó nuốt" với đa số lập trình viên ở buổi đầu tìm hiểu. Lý do là vì design patterns được xây dựng trên những khái niệm trừu tượng cũng như phải tuân thủ các nguyên tắc lập trình hướng đối tượng.
Đến với TechTalk #32: SOLID & Design Patterns, các bạn sẽ được giới thiệu những nguyên tắc cần phải tuân thủ này và cách thức áp dụng những design patterns quen thuộc vào giải quyết các bài toán một các ngắn gọn, xúc tích và hiệu quả thông qua các ví dụ thực tế.
Speaker: Khôi Nguyễn - Senior Software Engineer @ KMS Technology
How to access & analyse Twitter big data. Full working example using Storm and RedStorm in Ruby & JRuby. Code on github https://github.com/colinsurprenant/tweitgeist and live demo http://tweitgeist.needium.com/
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data TechHostedbyConfluent
In a world where there is an ever growing shift towards event driven streaming data, Kafka is firmly embedded in the epicenter of any Data Platform’s central nervous system. In an attempt to aide in the shift of analytics towards true event time, we have implemented a pure Kappa architecture - effectively turning the database inside out. Through extending the concept of a truly idempotent stream of events, Kafka has been elevated to the source of truth. We have eliminated extra network trips for joins as well as querying state which has significantly improved processing performance while also reducing processing latency. Tune in to discuss challenges, tips and lessons learned while implementing a pure Kappa Architecture. I will address hurdles such as scaling, warm standbys, schema evolution, and batch replay strategies - highlighting issues prevalent with any streaming Kappa based architecture. Streaming big data in and of itself comes with its own set of challenges - such as serialization formats, encryption, and strategies to efficiently utilize message headers. I invite each and every one of you to embark on a journey discussing a means to an end - resulting in processing billions of records each day.
At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
Design patterns là một đề tài "khó nuốt" với đa số lập trình viên ở buổi đầu tìm hiểu. Lý do là vì design patterns được xây dựng trên những khái niệm trừu tượng cũng như phải tuân thủ các nguyên tắc lập trình hướng đối tượng.
Đến với TechTalk #32: SOLID & Design Patterns, các bạn sẽ được giới thiệu những nguyên tắc cần phải tuân thủ này và cách thức áp dụng những design patterns quen thuộc vào giải quyết các bài toán một các ngắn gọn, xúc tích và hiệu quả thông qua các ví dụ thực tế.
Speaker: Khôi Nguyễn - Senior Software Engineer @ KMS Technology
How to access & analyse Twitter big data. Full working example using Storm and RedStorm in Ruby & JRuby. Code on github https://github.com/colinsurprenant/tweitgeist and live demo http://tweitgeist.needium.com/
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data TechHostedbyConfluent
In a world where there is an ever growing shift towards event driven streaming data, Kafka is firmly embedded in the epicenter of any Data Platform’s central nervous system. In an attempt to aide in the shift of analytics towards true event time, we have implemented a pure Kappa architecture - effectively turning the database inside out. Through extending the concept of a truly idempotent stream of events, Kafka has been elevated to the source of truth. We have eliminated extra network trips for joins as well as querying state which has significantly improved processing performance while also reducing processing latency. Tune in to discuss challenges, tips and lessons learned while implementing a pure Kappa Architecture. I will address hurdles such as scaling, warm standbys, schema evolution, and batch replay strategies - highlighting issues prevalent with any streaming Kappa based architecture. Streaming big data in and of itself comes with its own set of challenges - such as serialization formats, encryption, and strategies to efficiently utilize message headers. I invite each and every one of you to embark on a journey discussing a means to an end - resulting in processing billions of records each day.
Event-driven architecture is a versatile approach to designing and integrating complex software systems. These systems tend to be easier to model and build. Event-driven architecture is not a new concept, but as more organizations contemplate microservices, this approach to system design has become appropriate in more situations and is worth a fresh look.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
During my journey in micro-services, it became apparent that the REST standard has been widely used in communication between micro-services for a long time. But recently the gRPC started to invade its territory. It turns out that there are some good reasons for this. In this lecture, I will present an introduction to gRCP, its main characteristics and the reasons why companies like Google, Netflix, and Docker are adopting this flexible and performative medium of communication.
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
With special guests Ron Ratovsky and Darrel Miller from the OpenAPI Initiative's Technical Steering Committee, this SmartBear webinar session covered the history of Swagger and the OpenAPI Specification, and all the latest changes in OAS 3.1.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1RJcfss.
Juan Batiz-Benet makes a short intro of IPFS (the InterPlanetary File System), a new hypermedia distribution protocol, addressed by content and identities. He also discusses the IPLD data model and example data structures (unixfs, keychain, post). Filmed at qconsf.com.
Juan Batiz-Benet is an Independent Scientist.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
이 발표는 [야생의 땅: 듀랑고]의 지형 배포 시스템과 생태계 시뮬레이션 자동화 시스템에 대한 이야기를 다룹니다. 듀랑고의 각 섬은 크기와 지형, 기후 조건이 다양하고 섬의 개수가 많아서 수동으로 관리하는 것은 사실상 불가능합니다. 몇번의 사내 테스트와 베타 테스트를 거치면서 이러한 문제를 해결해주는 자동화된 도구의 필요성이 절실해졌고, 작년에 NDC에서 발표했던 생태계 시뮬레이터와 Docker, 그리고 아마존 웹서비스(AWS)를 이용하여 수많은 섬들을 자동으로 생성하고 관리하는 자동화 시스템을 구축하게 되었습니다. 그 과정에서 했던 고민들, 기존의 애플리케이션을 "Dockerizing" 했던 경험, AWS의 각 서비스들을 적절히 활용했던 이야기, AWS의 각 지역별 요금이 상이하다는 점을 이용해서 비용을 절감한 사례, 그리고 자동화 시스템의 문제점과 앞으로의 방향에 대해서 이야기 할 계획입니다.
Hadoop Distributed File System (HDFS) evolves from a MapReduce-centric storage system to a generic, cost-effective storage infrastructure where HDFS stores all data of inside the organizations. The new use case presents a new sets of challenges to the original HDFS architecture. One challenge is to scale the storage management of HDFS - the centralized scheme within NameNode becomes a main bottleneck which limits the total number of files stored. Although a typical large HDFS cluster is able to store several hundred petabytes of data, it is inefficient to handle large amounts of small files under the current architecture.
In this talk, we introduce our new design and in-progress work that re-architects HDFS to attack this limitation. The storage management is enhanced to a distributed scheme. A new concept of storage container is introduced for storing objects. HDFS blocks are stored and managed as objects in the storage containers instead of being tracked only by NameNode. Storage containers are replicated across DataNodes using a newly-developed high-throughput protocol based on the Raft consensus algorithm. Our current prototype shows that under the new architecture the storage management of HDFS scales 10x better, demonstrating that HDFS is capable of storing billions of files.
OpenAPI 3.0, And What It Means for the Future of SwaggerSmartBear
OpenAPI 3.0, which is based on the original Swagger 2.0 specification, is meant to provide a standard format to unify how an industry defines and describes RESTful APIs.
The release of OAS 3.0 marks a significant milestone in the growth of the API economy — bringing together collaborators from across industries, to evolve the specification to meet the needs of API developers and consumers across the world in an open and transparent manner.
We hosted a free Swagger training: OpenAPI 3.0, And What it Means for the Future of Swagger. More than 2,000 people signed up to learn more about the new specification, and to find out about what’s coming next for Swagger and SwaggerHub!
You can watch the full recording of the presentation here: https://swaggerhub.com/blog/api-resources/openapi-3-0-video-tutorial/
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob LisiInfluxData
Flux, the new InfluxData data scripting and query language (formerly IFQL), super-charges queries both for analytics and data science. Jacob Lisi from Grafana Labs will give a quick overview of the language features as well as the moving parts for a working deployment. Grafana is an open source dashboard solution that shares Flux’s passion for analytics and data science. For that reason, they are very excited to showcase the new Flux support within Grafana, and a couple of common analytics use cases to get the most out of your data.
In this InfluxDays NYC 2019 talk, Jacob Lisi will share the latest updates they have made with their Flux builder in Grafana.
When you're starting or running a company, how do you choose technology? The prevailing advice du jour is something along the lines of "use the best tool for the job." This is obviously right, but it is also devoid of meaning in an unfortunate way that lets people define "best" and "job" as myopically as they like.
Event-driven architecture is a versatile approach to designing and integrating complex software systems. These systems tend to be easier to model and build. Event-driven architecture is not a new concept, but as more organizations contemplate microservices, this approach to system design has become appropriate in more situations and is worth a fresh look.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
During my journey in micro-services, it became apparent that the REST standard has been widely used in communication between micro-services for a long time. But recently the gRPC started to invade its territory. It turns out that there are some good reasons for this. In this lecture, I will present an introduction to gRCP, its main characteristics and the reasons why companies like Google, Netflix, and Docker are adopting this flexible and performative medium of communication.
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
With special guests Ron Ratovsky and Darrel Miller from the OpenAPI Initiative's Technical Steering Committee, this SmartBear webinar session covered the history of Swagger and the OpenAPI Specification, and all the latest changes in OAS 3.1.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1RJcfss.
Juan Batiz-Benet makes a short intro of IPFS (the InterPlanetary File System), a new hypermedia distribution protocol, addressed by content and identities. He also discusses the IPLD data model and example data structures (unixfs, keychain, post). Filmed at qconsf.com.
Juan Batiz-Benet is an Independent Scientist.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
이 발표는 [야생의 땅: 듀랑고]의 지형 배포 시스템과 생태계 시뮬레이션 자동화 시스템에 대한 이야기를 다룹니다. 듀랑고의 각 섬은 크기와 지형, 기후 조건이 다양하고 섬의 개수가 많아서 수동으로 관리하는 것은 사실상 불가능합니다. 몇번의 사내 테스트와 베타 테스트를 거치면서 이러한 문제를 해결해주는 자동화된 도구의 필요성이 절실해졌고, 작년에 NDC에서 발표했던 생태계 시뮬레이터와 Docker, 그리고 아마존 웹서비스(AWS)를 이용하여 수많은 섬들을 자동으로 생성하고 관리하는 자동화 시스템을 구축하게 되었습니다. 그 과정에서 했던 고민들, 기존의 애플리케이션을 "Dockerizing" 했던 경험, AWS의 각 서비스들을 적절히 활용했던 이야기, AWS의 각 지역별 요금이 상이하다는 점을 이용해서 비용을 절감한 사례, 그리고 자동화 시스템의 문제점과 앞으로의 방향에 대해서 이야기 할 계획입니다.
Hadoop Distributed File System (HDFS) evolves from a MapReduce-centric storage system to a generic, cost-effective storage infrastructure where HDFS stores all data of inside the organizations. The new use case presents a new sets of challenges to the original HDFS architecture. One challenge is to scale the storage management of HDFS - the centralized scheme within NameNode becomes a main bottleneck which limits the total number of files stored. Although a typical large HDFS cluster is able to store several hundred petabytes of data, it is inefficient to handle large amounts of small files under the current architecture.
In this talk, we introduce our new design and in-progress work that re-architects HDFS to attack this limitation. The storage management is enhanced to a distributed scheme. A new concept of storage container is introduced for storing objects. HDFS blocks are stored and managed as objects in the storage containers instead of being tracked only by NameNode. Storage containers are replicated across DataNodes using a newly-developed high-throughput protocol based on the Raft consensus algorithm. Our current prototype shows that under the new architecture the storage management of HDFS scales 10x better, demonstrating that HDFS is capable of storing billions of files.
OpenAPI 3.0, And What It Means for the Future of SwaggerSmartBear
OpenAPI 3.0, which is based on the original Swagger 2.0 specification, is meant to provide a standard format to unify how an industry defines and describes RESTful APIs.
The release of OAS 3.0 marks a significant milestone in the growth of the API economy — bringing together collaborators from across industries, to evolve the specification to meet the needs of API developers and consumers across the world in an open and transparent manner.
We hosted a free Swagger training: OpenAPI 3.0, And What it Means for the Future of Swagger. More than 2,000 people signed up to learn more about the new specification, and to find out about what’s coming next for Swagger and SwaggerHub!
You can watch the full recording of the presentation here: https://swaggerhub.com/blog/api-resources/openapi-3-0-video-tutorial/
Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob LisiInfluxData
Flux, the new InfluxData data scripting and query language (formerly IFQL), super-charges queries both for analytics and data science. Jacob Lisi from Grafana Labs will give a quick overview of the language features as well as the moving parts for a working deployment. Grafana is an open source dashboard solution that shares Flux’s passion for analytics and data science. For that reason, they are very excited to showcase the new Flux support within Grafana, and a couple of common analytics use cases to get the most out of your data.
In this InfluxDays NYC 2019 talk, Jacob Lisi will share the latest updates they have made with their Flux builder in Grafana.
When you're starting or running a company, how do you choose technology? The prevailing advice du jour is something along the lines of "use the best tool for the job." This is obviously right, but it is also devoid of meaning in an unfortunate way that lets people define "best" and "job" as myopically as they like.
Utopia Kingdoms scaling case. From 4 users to 50.000+Python Ireland
Describing the real life case of Utopia Kingdoms, an online game. The game had initially problems scaling on production environment and had to be greatly refactored to support large number of players. This includes use of caching, profiling, queuing system and the migration of database from Amazon SimpleDB to MongoDB.
A talk for the 2012 PloneConf in Arnhem, the Netherlands. Speakers: Gil Forcada, Timo Stollenwerk, Kees Hink. How we built a newspaper website in Plone.
This talk goes over the host identification process we follow, the development of EyeWitness 1.0, the problems which lead to 2.0 and talk about future work on EyeWitness.
Linkedin has multiple data-centers hosting tens of thousands of servers across them. A large percentage of these servers host our data infrastructure - our distributed data store called Espresso is sizeable amongst them. The fleet of servers contain various hardware components including, but not limited to, SSDs; and hardware has a tendency of failing from time to time. In case of hardware failures the servers need to undergo maintenance which can take a significant amount of time based on type of failure. This creates reduced capacity for that duration and throws an interesting problem of maintaining capacity in the face of multiple failures. This talk covers how LinkedIn uses Camunda wrapped around with several components to achieve hands-off capacity management via multiple workflows, with asynchronous pauses and synchronisation among them. It will also highlight how we achieved seamless integrations with various platforms and components within Linkedin's Infrastructure, and a few best practices that helped us achieve the final state.
Decathlon’s mission is to make sport accessible to more people. Decathlon SportMeeting, its new social network, was created to take this one step further, allowing everyone to find people who share their sport and their passion.
DSM was defined from scratch to support the actual traffic with more than 100k registered users, 1000 active sport proposals for more than 30 sports.
This web platform is entirely built with Groovy & Grails but there are also applications in Android and iOS that use its RESTful API. Along the development process several plugins were created and open-sourced to the community.
In this talk Kaleidos will explain how the development of this platform was, some of the technical decisions that were made, lessons learned, pitfalls or how the infrastructure has been evolving for almost 3 years, and much more.
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems.
All of a sudden to monitor all of the components becomes a big data problem itself.
In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like:
Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services.
Not only the tools, what should you monitor about the actual data that flows in the system?
And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
Demi Ben-Ari is a Co-Founder and CTO @ Panorays.
Demi has over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.
Describing himself as a software development groupie, Interested in tackling cutting edge technologies.
Demi is also a co-founder of the “Big Things” Big Data community: http://somebigthings.com/big-things-intro/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
8. Introduction to News Feed
● Terms
● Social graph: Users in most social networking
sites are describable in terms of a social
graph. The relationships between users are
represented by adjacency lists. If Jack and Jill
are friends, they are said to be adjacent. This
is known as an "edge" in the graph. (from
Quora)
● Not only Friends
● but also Followers …
9. Introduction to News Feed
● What do we need?
● Someone does actions, his friend will see these
action in his home as soon as possible
● What will we solve the problems?
● Solution 1: Push model (fan out on write)
● Solution 2: Pull model (fan out on read)
● Solution 3: Mixing push and pull (Feeding
Frenzy- a paper from Yahoo)
10. Introduction to News Feed
● Push model
● This method involves denormalizing the user's activity
data and pushing the meta data to all the user's friends
at the time it occurs. (from Quora)
● Pull model
● This method involves keeping all recent activity data in
memory and pulling in (or fanning out) that data at the
time a user loads their home page. Data doesn't need to
be pushed out to all subscribers as soon as it happens,
so no back-log and no disk seeks (from Quora)
● Mix model
● Active user using push model
● Non active user using pull
12. ZingMe News Feed system history
● First version
● Using PHP for worker
● Using MySQL for feed item
● Using MySQL for feed indexing
● Having full feature: feed type filtering, ignoring
users ..
● Restarting DB and other services are the favorite
jobs at that time :)
● Lesson learn:
– Relation DB may not be fit for this kind of project
13. ZingMe News Feed system history
● Second version
● Still using PHP for worker
● Using Cassandra for feed item
● Using home build list id service for feed indexing
● Using Memcached for caching item
● Removing all deluxe features :) (stupid features due to
our limited technique)
● Restarting Cassandra, and waiting for compaction is our
favorite jobs :) :)
● Headache with changing avatar
● Lesson learn: believe only ourself
14. ZingMe News Feed system history
● Third version
● Moving to Java for better performance
● Still using Cassandra for feed item
● Trying to use redis in Lab
● Keep only simple features (KISS)
● Cannot control memcache
– The new one expired before the old one ???
– Memcached is wrong ???
● Cannot believe to Cassandra
● Lesson learn: memcached is not the “thuốc tiên” :)
17. ZingMe News Feed system
● Still using push model because of Twitter public some
info related to this model
● Not enough technical when choosing pull model
● Begin to understand a little bit about how to keep it
scaling
● Do not use Cassandra any more for such kind of this
system → do not believe to anyone, learn from what
they do and try our best
18.
19. ZingMe News Feed system
● Feed Item
● UserId, ObjectId, Created date...
● Storage: home build based on Kyoto Cabinet
● Fast recovery when crash
● Feed Index
● UserId → [feedId1,feedId2...]
● Storage: home build
● Fast recovery when crash
20. ZingMe News Feed system
● Rate limit
● Prefilter Spam or auto tool based on rate of write request
● When hit limit, block that user for amount of time
● Feed writer
● Receive the write command
● Get the next Id from Generator
● Push the item to queue
● Return the feedId for future reference
21. ZingMe News Feed system
● Gearman feed storage queue
● Very fast
● Support multi language client
● Some time block the all workers when network
unstable :)
● Solve most of our heavy jobs
22. ZingMe News Feed system
● Feed Sync center
● Sync the new feed to the others such as:
– Spam detection
– Feed ranking system
– Logging system
● Feed replication function for future use
23. ZingMe News Feed system
● Feed Render worker
● The main and heavy job:
– Get the feed item
– Extract the template id
– Get user info
– Render the feed based on them
● Put rendered feed in to appropriate cache
● Mobile and Desktop are totally different
24. ZingMe News Feed system
● Feed Aggregate
● Get the feed index
● Get the rendered item from cache
● Return to the front-end
● Some cheat:
– If the cached items less than 5, in stead of returning
the data return a JavaScript to reload that list
– At the same time push a task to warm-up the
rendered cache
● Auto fail-over when a cache service die
26. Some statistics
● ~15M actions / day
● 10% Spam
● Gift receive
● Meaningless status
● Cache hit 98%
● ~80M registered users
● ~3M active users / days
● Max 1000 friends only
● Unlimited followers
27. Bonus
●
Twemcache (https://github.com/twitter/twemcache)
● From Twitter
● Solve most problems with memcached
● More strategy for eviction items
– Item LRU eviction: per-slabclass LRU eviction
– Random eviction : evict all items from a randomly chosen slab
– ...
● Twemcache proxy
●
Redis (http://redis.io)
● Replacement for home build when you have not enough time
● Set is default supported
● Supported cluster
● Persistence