These are the slides for the talk I presented at the LA Web Speed meetup hosted by Yahoo on May 17, 2013 - http://www.meetup.com/LAWebSpeed/events/115663212/
This document provides examples of using Redis data structures to solve common data modeling problems. It discusses using Redis lists to improve logging performance, hashes to track daily visitor counts, JSON to implement shopping carts, sets for tracking likes on posts, and bits to count unique daily visitors at scale. Pipelines, Lua scripting, and read replicas are proposed to further optimize some solutions.
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
An introduction to Redis for the SQL practitioner, covering data types and common use cases.
The video of this session can be found at: https://www.youtube.com/watch?v=8Unaug_vmFI
Session Recordings - Unlocking the user experience with DataBrowsee
Browsee (https://browsee.io) is a tool that helps in improving the user experience of your website. With Browsee integrated into your website, you can watch the session recording as well as heatmaps to see where users are facing an issue or where they are engaging. It gives you tonnes of visual insights that help you in improving your product, navigation, new user onboarding, and much more.
Session replay technology refers to the process of recording and playing back the user sessions on a website or application, almost like a screen recording. This technology is vital for Product and user experience teams to analyze user behavior throughout their product and uncover potential problems.
Building it, however, presents a number of difficulties, particularly from the point of view of data and scale. Some of these challenges include:
1. Data Capture: Capturing all Page (DOM) changes, mouse movements and user activity in real time with almost no scope for buffering at client side. We discuss the impact of websockets vs http data transfer and queueing to protect against any data loss.
2. Data privacy: Recording and storing user sessions can raise privacy concerns. Session replay technology must be designed in a way that protects user privacy and complies with data protection laws. This involves challenges both at the source as well as at rest and requires geographical distribution of data.
3. Data Storage: Session replay technology generates large amounts of data as it records every user interaction on a website or application. Storing this data can be a challenge, particularly when dealing with high traffic websites or applications. We use ScyllaDb for our storage and experimented with different compaction strategies for our use case.
4. User Experience: Lastly, watching replays of sessions should be simple and speedy for optimum user experience. It implies that every recording should be playable again in near real-time and impacts our data sharding.
In this talk, we'll discuss how our team at Browsee approached these problems and what we discovered along the way.
Neufund.com is a cloud-based RA management software that allows RAs to manage finances, residents' data, and payments. It features a dashboard to track collections, accounts to manage fees and funds, statements, a residents directory, and committee management tools. Using Neufund increases collection efficiency, provides payment receipts and statements, and brings transparency to the RA's financial status. The software is accessible online from any device with a web browser. Standard pricing is RM300 per month billed every 6 months, with discounts for early signups.
E commerce data migration in moving systems across data centres Regunath B
This document summarizes Flipkart's data migration process from one data center to another. Key points include:
- Flipkart processes large volumes of data daily from users and orders and needed to migrate systems across data centers with minimal downtime.
- The migration involved over 1000 systems and terabytes of data across various data stores like HBase, Elasticsearch, MySQL, Kafka and data platforms.
- Custom tools were developed to efficiently migrate and replicate data across data centers to minimize bandwidth usage and ensure consistency, including a tool to migrate HBase data using hard disks and utilities for migrating MySQL, product catalogs and user sessions.
- The migration was carefully planned and executed over several weeks to move systems
This document provides examples of using Redis data structures to solve common data modeling problems. It discusses using Redis lists to improve logging performance, hashes to track daily visitor counts, JSON to implement shopping carts, sets for tracking likes on posts, and bits to count unique daily visitors at scale. Pipelines, Lua scripting, and read replicas are proposed to further optimize some solutions.
Redis Use Patterns (DevconTLV June 2014)Itamar Haber
An introduction to Redis for the SQL practitioner, covering data types and common use cases.
The video of this session can be found at: https://www.youtube.com/watch?v=8Unaug_vmFI
Session Recordings - Unlocking the user experience with DataBrowsee
Browsee (https://browsee.io) is a tool that helps in improving the user experience of your website. With Browsee integrated into your website, you can watch the session recording as well as heatmaps to see where users are facing an issue or where they are engaging. It gives you tonnes of visual insights that help you in improving your product, navigation, new user onboarding, and much more.
Session replay technology refers to the process of recording and playing back the user sessions on a website or application, almost like a screen recording. This technology is vital for Product and user experience teams to analyze user behavior throughout their product and uncover potential problems.
Building it, however, presents a number of difficulties, particularly from the point of view of data and scale. Some of these challenges include:
1. Data Capture: Capturing all Page (DOM) changes, mouse movements and user activity in real time with almost no scope for buffering at client side. We discuss the impact of websockets vs http data transfer and queueing to protect against any data loss.
2. Data privacy: Recording and storing user sessions can raise privacy concerns. Session replay technology must be designed in a way that protects user privacy and complies with data protection laws. This involves challenges both at the source as well as at rest and requires geographical distribution of data.
3. Data Storage: Session replay technology generates large amounts of data as it records every user interaction on a website or application. Storing this data can be a challenge, particularly when dealing with high traffic websites or applications. We use ScyllaDb for our storage and experimented with different compaction strategies for our use case.
4. User Experience: Lastly, watching replays of sessions should be simple and speedy for optimum user experience. It implies that every recording should be playable again in near real-time and impacts our data sharding.
In this talk, we'll discuss how our team at Browsee approached these problems and what we discovered along the way.
Neufund.com is a cloud-based RA management software that allows RAs to manage finances, residents' data, and payments. It features a dashboard to track collections, accounts to manage fees and funds, statements, a residents directory, and committee management tools. Using Neufund increases collection efficiency, provides payment receipts and statements, and brings transparency to the RA's financial status. The software is accessible online from any device with a web browser. Standard pricing is RM300 per month billed every 6 months, with discounts for early signups.
E commerce data migration in moving systems across data centres Regunath B
This document summarizes Flipkart's data migration process from one data center to another. Key points include:
- Flipkart processes large volumes of data daily from users and orders and needed to migrate systems across data centers with minimal downtime.
- The migration involved over 1000 systems and terabytes of data across various data stores like HBase, Elasticsearch, MySQL, Kafka and data platforms.
- Custom tools were developed to efficiently migrate and replicate data across data centers to minimize bandwidth usage and ensure consistency, including a tool to migrate HBase data using hard disks and utilities for migrating MySQL, product catalogs and user sessions.
- The migration was carefully planned and executed over several weeks to move systems
Twitter handles billions of events per minute that are logged by clients. They use a scalable architecture with modular client daemons and aggregator daemons to aggregate events into categories on HDFS. To improve scalability, they group categories into category groups and write events to HDFS files together. They also group aggregators to scale independently and isolate resources. Within a single aggregator, they improved memory usage and added microbatching to reduce latency. Going forward, they aim to further reduce latency and improve failure handling.
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts.
aFurthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions.
We will talk about the details of our solution and the interesting technical challenges faced.
This document discusses Badoo's use of MicroStrategy for business intelligence and analytics. It describes how MicroStrategy helped Badoo overcome challenges with their previous BI tool by providing dimensional modeling, self-service reports, and weekly releases. It highlights how MicroStrategy enabled data discovery, analysis delivery, and reporting for over 90 users across various teams. The document also provides examples of query optimizations in MicroStrategy that improved performance. Finally, it discusses how MicroStrategy has enabled Badoo to empower users through visual insights, transaction services, command manager automation, and streamlined web deployments.
This document discusses using Elasticsearch as a time series database. It covers why Elasticsearch was chosen over other options for storing metrics from the open source performance monitoring tool Stagemonitor. The document discusses Elasticsearch's ability to scale, its functions and visualization support in Kibana. It also covers how Stagemonitor's data is modeled in Elasticsearch, including the use of tags, and how index management is handled through a hot/cold node architecture and tools like Curator.
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
Moving to a new home is daunting. Packing up all your things, getting a vehicle to move it all, unpacking it, updating your mailing address, and making sure you did not leave anything behind. Well, the move to MongoDB Atlas is similar, but all the logistics are already figured out for you by MongoDB.
This document describes uberVU's use of big data to monitor social media mentions and provide analytics to clients. It discusses how uberVU ingests large amounts of social media data daily using distributed technologies like Amazon Web Services, MongoDB, and Redis. Machine learning algorithms are used to analyze and classify data, though batch processing is more efficient. Signals like influencers and trends are identified. Lessons learned include the importance of monitoring systems and planning for failures.
The document discusses how data and system architectures evolve over time as usage grows. It uses a hypothetical example of a seamonkey management application to illustrate this. As the application gains more users through promotions on sites like Reddit and Hacker News, more types of data are collected and the architecture becomes more complex, with additions like caching, worker processes, and databases. The document also discusses concepts like CAP theorem, ACID properties, and eventual consistency that become relevant at larger scales. The key point is that understanding how systems need to change in response to data growth can help architects set up services and infrastructure to scale smoothly over time.
Datacratic is the leader in real-time machine learning and decisioning and the creator of the RTBkit Open-Source Project. Mark Weiss, head of client solutions at Datacratic shares some of the challenges companies and developers face today as they move into Real Time Bidding. In this presentation he does a developer deep dive into design and implementation choices, technologies, plugins and provide some real world RTB customer use cases. You will also learn how you can join the RTBkit community get support for your upcoming RTBkit initiatives.
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
Enabling real-time exploration and analytics at scale to drive operational intelligence at Hulu by Indrasis Mondal, Director, Data Engineering and Data Products, Hulu
Data is one of most powerful assets for companies today and a key driver for innovation, product development and business efficiency. Operational intelligence allows modern organization to use that data asset in real-time to enable immediate insights to their business operations and allow rapid decision making for strategic advantage. In this presentation we will walk through the operational intelligence capabilities Hulu has built to process tens of millions of events per minute to enable fast exploration of data and real-time decision making .
Those Days is designed for the everyday traveler and blogger, who is always on the move. With an automated log of daily activities generated onto the phone, the user never has to manual input a single thing. Users will be able to go along their day without having to interact with the application until they want to. This way, we believe, will provide the user with the optimal experience. Those Days provides the automation that current products on the market today can not. Users can view the current day or their previous days by selecting a day in the
calendar and proceed to browse through the activities of the selected day. They can select and view individual activities and their corresponding metadata, for example, a picture taken during the time of an activity. If desired, the user can input additional comments to the activity or even tweet it out.
Our journey with druid - from initial research to full production scaleItai Yaffe
Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches).
It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases.
In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective.
Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.
ABOUT SPACE APE
Space Ape's hit real time strategy game, Samurai Siege, has been played by over 11m people and generated over $50m in revenue since it's launch in October 2013. The game was built by a team of 12 over 6 months.
Samurai Siege has sustained in the grossing charts where many come and go in no small part because of the team's focus on live operations. Every week new content is pushed live, marketing strategies are refreshed and the game is optimised based on a combination of player research and analytics.
ABOUT THIS PRESENTATION
This presentation shows the evolution of the Samurai Siege analytics stack and some of the applications of the data by Space Ape's product, marketing and community teams.
The stack started as a simple MVP but evolved over time as the game matured and the competitive landscape changed. It is now a fully functioning service that was easily replicated to support the launch of their next game Rival Kingdoms (currently in public Beta). As such, the presentation will be of interest to smaller games studios who are figuring out how to prioritise investment in data as well as established studios who might be re-thinking their legacy systems and figuring out how to bring the data focus needed to succeed in the modern free to play games business.
This presentation was made by Space Ape's analyst Richard Reyes and shared with local game developers at the Great British Big Data Game Show & Tell in London on 25 February 2015.
For more on Space Ape's Live Ops and Analytics stacks see
https://tech.spaceapegames.com/2016/12/07/space-ape-live-ops-boot-camp/
Deltek Vision User Group Meeting - Q2 2013BCS ProSoft
These slides are from BCS Prosoft's Deltek Vision User Group Meeting that covered Advanced Financial Reporting, BizInsights by Biznet, and 5 Tips & Tricks to help you get the most out of Deltek Vision. For a list of our upcoming Deltek Vision UGMs in San Antonio, Houston, Denver, Honolulu, and web-based UGM visit http://www.bcsprosoft.com/events/user-group-meetings/
Ceilometer lsf-intergration-openstack-summitTim Bell
CERN uses ceilometer as a single source of truth for accounting data from both virtual machines and batch computing. Ceilometer implements a plugin to poll CERN's batch accounting database for unpublished records, which are then pushed to RabbitMQ. A ceilometer collector consumes the messages and inserts the data into a ceilometer database. This decreases load on the OpenStack messaging server by using separate instances and RabbitMQ servers for VM and batch metering data. Most batch data is published within two runs processing around 200,000 job records each, which takes approximately 5 hours to complete. The average publishing rate to the batch RabbitMQ server is 11 records per second.
The document discusses the history and current system of ZingMe's news feed. It describes moving from a PHP/MySQL version to a Java/Cassandra version and now a home-built Kyoto Cabinet and feed index system. Key aspects of the current system include rate limiting, feed storage in a Gearman queue, rendering feeds, caching, and aggregating feeds for users. Statistics provided include 15M daily actions, 80M registered users, and 3M daily active users. The document also discusses Twemcache and Redis as alternatives to their current solutions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Twitter handles billions of events per minute that are logged by clients. They use a scalable architecture with modular client daemons and aggregator daemons to aggregate events into categories on HDFS. To improve scalability, they group categories into category groups and write events to HDFS files together. They also group aggregators to scale independently and isolate resources. Within a single aggregator, they improved memory usage and added microbatching to reduce latency. Going forward, they aim to further reduce latency and improve failure handling.
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts.
aFurthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions.
We will talk about the details of our solution and the interesting technical challenges faced.
This document discusses Badoo's use of MicroStrategy for business intelligence and analytics. It describes how MicroStrategy helped Badoo overcome challenges with their previous BI tool by providing dimensional modeling, self-service reports, and weekly releases. It highlights how MicroStrategy enabled data discovery, analysis delivery, and reporting for over 90 users across various teams. The document also provides examples of query optimizations in MicroStrategy that improved performance. Finally, it discusses how MicroStrategy has enabled Badoo to empower users through visual insights, transaction services, command manager automation, and streamlined web deployments.
This document discusses using Elasticsearch as a time series database. It covers why Elasticsearch was chosen over other options for storing metrics from the open source performance monitoring tool Stagemonitor. The document discusses Elasticsearch's ability to scale, its functions and visualization support in Kibana. It also covers how Stagemonitor's data is modeled in Elasticsearch, including the use of tags, and how index management is handled through a hot/cold node architecture and tools like Curator.
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
Moving to a new home is daunting. Packing up all your things, getting a vehicle to move it all, unpacking it, updating your mailing address, and making sure you did not leave anything behind. Well, the move to MongoDB Atlas is similar, but all the logistics are already figured out for you by MongoDB.
This document describes uberVU's use of big data to monitor social media mentions and provide analytics to clients. It discusses how uberVU ingests large amounts of social media data daily using distributed technologies like Amazon Web Services, MongoDB, and Redis. Machine learning algorithms are used to analyze and classify data, though batch processing is more efficient. Signals like influencers and trends are identified. Lessons learned include the importance of monitoring systems and planning for failures.
The document discusses how data and system architectures evolve over time as usage grows. It uses a hypothetical example of a seamonkey management application to illustrate this. As the application gains more users through promotions on sites like Reddit and Hacker News, more types of data are collected and the architecture becomes more complex, with additions like caching, worker processes, and databases. The document also discusses concepts like CAP theorem, ACID properties, and eventual consistency that become relevant at larger scales. The key point is that understanding how systems need to change in response to data growth can help architects set up services and infrastructure to scale smoothly over time.
Datacratic is the leader in real-time machine learning and decisioning and the creator of the RTBkit Open-Source Project. Mark Weiss, head of client solutions at Datacratic shares some of the challenges companies and developers face today as they move into Real Time Bidding. In this presentation he does a developer deep dive into design and implementation choices, technologies, plugins and provide some real world RTB customer use cases. You will also learn how you can join the RTBkit community get support for your upcoming RTBkit initiatives.
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
Enabling real-time exploration and analytics at scale to drive operational intelligence at Hulu by Indrasis Mondal, Director, Data Engineering and Data Products, Hulu
Data is one of most powerful assets for companies today and a key driver for innovation, product development and business efficiency. Operational intelligence allows modern organization to use that data asset in real-time to enable immediate insights to their business operations and allow rapid decision making for strategic advantage. In this presentation we will walk through the operational intelligence capabilities Hulu has built to process tens of millions of events per minute to enable fast exploration of data and real-time decision making .
Those Days is designed for the everyday traveler and blogger, who is always on the move. With an automated log of daily activities generated onto the phone, the user never has to manual input a single thing. Users will be able to go along their day without having to interact with the application until they want to. This way, we believe, will provide the user with the optimal experience. Those Days provides the automation that current products on the market today can not. Users can view the current day or their previous days by selecting a day in the
calendar and proceed to browse through the activities of the selected day. They can select and view individual activities and their corresponding metadata, for example, a picture taken during the time of an activity. If desired, the user can input additional comments to the activity or even tweet it out.
Our journey with druid - from initial research to full production scaleItai Yaffe
Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches).
It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases.
In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective.
Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.
ABOUT SPACE APE
Space Ape's hit real time strategy game, Samurai Siege, has been played by over 11m people and generated over $50m in revenue since it's launch in October 2013. The game was built by a team of 12 over 6 months.
Samurai Siege has sustained in the grossing charts where many come and go in no small part because of the team's focus on live operations. Every week new content is pushed live, marketing strategies are refreshed and the game is optimised based on a combination of player research and analytics.
ABOUT THIS PRESENTATION
This presentation shows the evolution of the Samurai Siege analytics stack and some of the applications of the data by Space Ape's product, marketing and community teams.
The stack started as a simple MVP but evolved over time as the game matured and the competitive landscape changed. It is now a fully functioning service that was easily replicated to support the launch of their next game Rival Kingdoms (currently in public Beta). As such, the presentation will be of interest to smaller games studios who are figuring out how to prioritise investment in data as well as established studios who might be re-thinking their legacy systems and figuring out how to bring the data focus needed to succeed in the modern free to play games business.
This presentation was made by Space Ape's analyst Richard Reyes and shared with local game developers at the Great British Big Data Game Show & Tell in London on 25 February 2015.
For more on Space Ape's Live Ops and Analytics stacks see
https://tech.spaceapegames.com/2016/12/07/space-ape-live-ops-boot-camp/
Deltek Vision User Group Meeting - Q2 2013BCS ProSoft
These slides are from BCS Prosoft's Deltek Vision User Group Meeting that covered Advanced Financial Reporting, BizInsights by Biznet, and 5 Tips & Tricks to help you get the most out of Deltek Vision. For a list of our upcoming Deltek Vision UGMs in San Antonio, Houston, Denver, Honolulu, and web-based UGM visit http://www.bcsprosoft.com/events/user-group-meetings/
Ceilometer lsf-intergration-openstack-summitTim Bell
CERN uses ceilometer as a single source of truth for accounting data from both virtual machines and batch computing. Ceilometer implements a plugin to poll CERN's batch accounting database for unpublished records, which are then pushed to RabbitMQ. A ceilometer collector consumes the messages and inserts the data into a ceilometer database. This decreases load on the OpenStack messaging server by using separate instances and RabbitMQ servers for VM and batch metering data. Most batch data is published within two runs processing around 200,000 job records each, which takes approximately 5 hours to complete. The average publishing rate to the batch RabbitMQ server is 11 records per second.
The document discusses the history and current system of ZingMe's news feed. It describes moving from a PHP/MySQL version to a Java/Cassandra version and now a home-built Kyoto Cabinet and feed index system. Key aspects of the current system include rate limiting, feed storage in a Gearman queue, rendering feeds, caching, and aggregating feeds for users. Statistics provided include 15M daily actions, 80M registered users, and 3M daily active users. The document also discusses Twemcache and Redis as alternatives to their current solutions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Communications Mining Series - Zero to Hero - Session 1
Josiah carlson 2013-05-16 - redis analytics
1. A High-Level Pass
Through Redis Analytics*
by Josiah Carlson www.dr-josiah.com
@dr_josiah bit.ly/redis-in-action
2. Agenda
● Quick overview of Redis
● Monthly unique return/churn
○ too much memory method
○ reasonable memory method
○ very low memory method
● Visitor action sequence analytics
○ sequence method
○ low-memory method
● Geographic notifications with partitioning*
3. Quick Redis overview
● Remote key -> data structure server
○ Strings/integers/bitmaps
○ Lists of strings
○ Sets of unique string members
○ Hashes of key -> value
○ Sorted sets (ZSETs) mapping of member -> score
● Supports
○ Persistence
○ Replication
○ Publish/subscribe
○ Server-side Lua scripting (like a stored procedure)
○ Client-side sharding (server side in-progress)
4. Monthly unique return/churn
Problem:
● Say that you have millions of monthly visitors
● Need to know monthly churn, expected
~50%
● Don't want to waste too much memory
5. Monthly unique return/churn
Too much memory:
● Generate UUIDs for users, store in cookie
● Use a HASH mapping from UUIDs to int ids
● Use a HASH mapping from int ids to UUIDs
● Create a ZSET of short ids to timestamp
● Use per-month bitmaps for churn calculation
● Recycle int ids based on old timestamps,
discarding UUIDs and resetting bits
6. Monthly unique return/churn
Drawbacks:
● Memory use based on size of HASHes and
ZSET (about to 400 bytes/unique user)
● Second HASH can be thrown away
● The other HASH, ZSET, and bitmaps can be
thrown away and replaced by a "this month"
and "last month" SET (about 120 bytes/user)
● With 63 bit integer UUID and sharding
techniques, about 16 bytes/user
7. Monthly unique return/churn
Reasonable memory solution:
● Store per-month id in a signed cookie (lower-32 is the
unique id for the month, next 8 is the month)
● One month of bitmap
● If this month cookie, do nothing
● If last month cookie and bit isn't set for that id, mark the
bitmap, generate a new cookie, increment unique and
returning counts
● If last month cookie and bit is set, generate a new
cookie
● If old cookie or no cookie, generate a new cookie,
increment unique count
8. Monthly unique return/churn
Drawbacks:
● Memory use based on unique monthly
counts, ~1 bit per user (not bad)
● If you push to hundreds of millions/billions of
users, you should shard your bitmaps to
minimize realloc cost on bitmap updates
9. Monthly unique return/churn
Very low memory method:
● Store per-month id in a signed cookie
● If this month cookie, do nothing
● If last month cookie, generate a new cookie
for the client, increment unique and return
counts
● If old cookie or no cookie, generate a new
cookie, increment unique count
10. Monthly unique return/churn
Drawback:
● If someone sends you duplicate cookies,
hard to detect (keep "recently replaced"
cache, 5-10 minutes worth is likely good
enough)
11. Tangent on ZSETs
This slide is a filler so that I can talk about one
of my favorite "get rid of ZSETs" tricks, which
results in significant memory savings for a fairly
large subset of problems
13. Visitor action sequences
Sequence method:
● Each user gets a LIST
● All users are recorded in a ZSET with a score based on
time
● Each action/page RPUSHes the action/page to the LIST
● Clean-up/analyze old sequences based on timestamps
in the ZSET
Drawbacks:
● Memory use can be high for active users
● More detailed events can use more memory
14. Visitor action sequences
Low memory method:
● Each user gets a bitmap (limit your unique events)
● All actions are mapped to an index in the bitmap
● When a user performs the action/visits the page, set the
bit and update the ZSET
● Clean up/analyze old bitmaps based on timestamps in
the ZSET
Drawbacks:
● No more strict sequence analysis possible
● Memory use is dominated by ZSET storage
15. Geo Notifications
Problem:
● Want to send events to nearby users
● Don't want users to be notified too often
● Reduce radius of results as notifications rise
● Increase radius of results as notifications fall
● Allow for history to be received on connect
16. Geo Notifications
● Consider the world as a recursively-divided series of
blocks (highest level as 1x1 degree)
● Clients subscribe to all block levels that their user is in
or is interested in
● When writing an event at point (lat,lon):
○ Add the event id to ZSETs to as deep a partition as you would ever
expect to need
○ Trim the ZSETs along the way based on your desired history
○ Check the resulting size of the ZSETs to determine the highest-level
block that is under your limit
○ Publish the event to a channel based on that level
17. Geo Notifications
Drawbacks:
● Event id/timestamp information is duplicated
● Large histories may use significant memory
(ZSETs can be replaced by LISTs with
minimal changes)
● Old data in un-visited blocks aren't cleaned
out (can add expiration)