This document summarizes a study that benchmarked the performance of personal cloud storage services like Dropbox, Google Drive, SkyDrive, Wuala, and Amazon Cloud Drive. The study developed a methodology to test each service's system architecture, data center locations, file synchronization capabilities, and performance. Key findings include: Dropbox implemented the most capabilities but had high overhead, while Google Drive and Wuala had the fastest completion times due to data centers near the testbed. The study provided insights into how each service's design impacts performance.
How companies use NoSQL & Couchbase - NoSQL Now 2014Dipti Borkar
My presentation from the NoSQL Now 2014 conference.
Abstract
NoSQL databases including Couchbase are increasingly being selected as the backend technology for web and mobile apps. Document databases in particular are well suited for a large number of different use cases as an operational datastore.
This session provides a brief overview of Couchbase Server, a document database and its underlying distributed architecture. In addition, Dipti will present some common use cases of Couchbase with a drill down into three specific customer use cases.
Paypal – A multi data center session store
LivePerson – A scalable, real time analytics system
Orbitz – A highly available cache solution
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr’s full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
A presentation by Anu Engineer of Cloudera regarding the state of the Ozone subproject. He covers a brief introduction of what Ozone is, and where it's headed.
This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.
How companies use NoSQL & Couchbase - NoSQL Now 2014Dipti Borkar
My presentation from the NoSQL Now 2014 conference.
Abstract
NoSQL databases including Couchbase are increasingly being selected as the backend technology for web and mobile apps. Document databases in particular are well suited for a large number of different use cases as an operational datastore.
This session provides a brief overview of Couchbase Server, a document database and its underlying distributed architecture. In addition, Dipti will present some common use cases of Couchbase with a drill down into three specific customer use cases.
Paypal – A multi data center session store
LivePerson – A scalable, real time analytics system
Orbitz – A highly available cache solution
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr’s full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
A presentation by Anu Engineer of Cloudera regarding the state of the Ozone subproject. He covers a brief introduction of what Ozone is, and where it's headed.
This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
While cloud computing offers virtually unlimited capacity, harnessing that capacity in an efficient, cost effective fashion can be cumbersome and difficult at the workload level. At the organizational level, it can quickly become chaos.
You must make choices around cloud deployment, and these choices could have a long-lasting impact on your organization. It is important to understand your options and avoid incomplete, complicated, locked-in scenarios. Data management and placement challenges make having the ability to automate workflows and processes across multiple clouds a requirement.
In this webinar, you will:
• Learn how to leverage cloud services as part of an overall computation approach
• Understand data management in a cloud-based world
• Hear what options you have to orchestrate HPC in the cloud
• Learn how cloud orchestration works to automate and align computing with specific goals and objectives
• See an example of an orchestrated HPC workload using on-premises data
From computational research to financial back testing, and research simulations to IoT processing frameworks, decisions made now will not only impact future manageability, but also your sanity.
"Wire Encryption In HDFS: Protect Your Data From Others, Not Yourself"
ApacheCon 2019, Las Vegas.
SPEAKERS: Chen Liang, Konstantin Shvachko. LinkedIn
Wire data encryption is a key component of the Hadoop Distributed File System (HDFS). HDFS can enforce different levels of data protection, allowing users to specify one based on their own needs. However, such enforcement comes in as an all-or-nothing feature. Namely, wire encryption is enforced either for all accesses or none. Since encryption bears a considerable performance cost, the all-or-nothing condition forces users to choose between 'faster but unencrypted' or 'encrypted but slower' for all clients. In our use case at LinkedIn, we would like to selectively expose fast unencrypted access to fully managed internal clients, which can be trusted, while only expose encrypted access to clients outside of the trusted circle with higher security risks. That way we minimize performance overhead for trusted internal clients while still securing data from potential outside threats. We re-evaluate the RPC encryption mechanism in HDFS. Our design extends HDFS NameNode to run on multiple ports. Depending on the configuration, connecting to different NameNode ports would end up with different levels of encryption protection. This protection then gets enforced for both NameNode RPC and the subsequent data transfers to/from DataNode. System administrators then need to set up a simple firewall rule to allow access to the unencrypted port only for internal clients and expose the encrypted port to the outside clients. This approach comes with minimum operational and performance overhead. The feature has been introduced to Apache Hadoop under HDFS-13541.
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
How did Maginatics build a strongly consistent and secure distributed file system? Niraj Tolia, Chief Architect at Maginatics, gave this presentation on the design of MagFS at the Storage Developer Conference on September 16, 2013.
For more information about MagFS—The File System for the Cloud, visit maginatics.com or contact us directly at info@maginatics.com.
Building a Just-in-Time Application Stack for AnalystsAvere Systems
Slide presentation from Webinar on February 17, 2016.
People in analytical roles are demanding more and more compute and storage to get their jobs done. Instead of building out infrastructure for a few employees or a department, systems engineers and IT managers can find value in creating a compute stack in the cloud to meet the fluctuating demand of their clients.
In this 45-minute webinar, you’ll learn:
- How to identify the right analytical workloads
- How to create a scalable compute environment using the cloud for analysts in under 10 minutes
- How to best manage costs associated with the cloud compute stack
- How to create dedicated client stacks with their own scratch space as well as general access to reference data
Health systems departments, research & development departments, and business analyst groups all face silos of these challenging, compute-intensive use cases. By learning how to quickly build this flexible workflow that can be scaled up and down (or off) instantly, you can support business objectives while efficiently managing costs.
The 3.0 release of the Maginatics Cloud Storage Platform (MCSP) includes great improvements in Data Protection, Multi-tier Caching and APIs, as well as other significant new features that make Maginatics the ideal choice for enterprise businesses with demanding storage requirements.
Geek Sync | Infrastructure for the Data Professional: An IntroductionIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: https://www.idera.com/resourcecentral/webcasts/geeksync/infrastructure-for-the-data-professional
It doesn’t matter if you are a DBA, application developer, database developer, or BI pro, the infrastructure your SQL Server environment runs on is important. If you didn’t “grow-up” on the system administration side of IT or, perhaps, you have been out of the operations world long enough to have fallen out of the loop with what is happening. This session is intended to provide a full stack infrastructure overview so that you can talk shop with your cohorts in operations to resolve issues and maybe even be proactive. We will discuss, in an introductory fashion, hardware, network, storage, virtualization and operating system layers. Additionally, some suggestions as to where to find more information will be provided.
Speaker: Peter Shore is a seasoned IT professional with over 25 years of experience. He took the accidentally intentional DBA plunge in 2013 and has discovered that he loves to find the stories the data has to tell. Peter is comfortable working with both physical and virtual servers, where he tries to apply best practices to attain performance improvements. He is also adept at bridging the gap between technical and business language in order to bring technology solutions to business needs.
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
اسلایدهای کارگاه پردازش های موازی با استفاده از زیرساخت جی پی یو GPU
اولین کارگاه ملی رایانش ابری کشور
وحید امیری
vahidamiry.ir
دانشگاه صنعتی امیرکبیر - 1391
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
While cloud computing offers virtually unlimited capacity, harnessing that capacity in an efficient, cost effective fashion can be cumbersome and difficult at the workload level. At the organizational level, it can quickly become chaos.
You must make choices around cloud deployment, and these choices could have a long-lasting impact on your organization. It is important to understand your options and avoid incomplete, complicated, locked-in scenarios. Data management and placement challenges make having the ability to automate workflows and processes across multiple clouds a requirement.
In this webinar, you will:
• Learn how to leverage cloud services as part of an overall computation approach
• Understand data management in a cloud-based world
• Hear what options you have to orchestrate HPC in the cloud
• Learn how cloud orchestration works to automate and align computing with specific goals and objectives
• See an example of an orchestrated HPC workload using on-premises data
From computational research to financial back testing, and research simulations to IoT processing frameworks, decisions made now will not only impact future manageability, but also your sanity.
"Wire Encryption In HDFS: Protect Your Data From Others, Not Yourself"
ApacheCon 2019, Las Vegas.
SPEAKERS: Chen Liang, Konstantin Shvachko. LinkedIn
Wire data encryption is a key component of the Hadoop Distributed File System (HDFS). HDFS can enforce different levels of data protection, allowing users to specify one based on their own needs. However, such enforcement comes in as an all-or-nothing feature. Namely, wire encryption is enforced either for all accesses or none. Since encryption bears a considerable performance cost, the all-or-nothing condition forces users to choose between 'faster but unencrypted' or 'encrypted but slower' for all clients. In our use case at LinkedIn, we would like to selectively expose fast unencrypted access to fully managed internal clients, which can be trusted, while only expose encrypted access to clients outside of the trusted circle with higher security risks. That way we minimize performance overhead for trusted internal clients while still securing data from potential outside threats. We re-evaluate the RPC encryption mechanism in HDFS. Our design extends HDFS NameNode to run on multiple ports. Depending on the configuration, connecting to different NameNode ports would end up with different levels of encryption protection. This protection then gets enforced for both NameNode RPC and the subsequent data transfers to/from DataNode. System administrators then need to set up a simple firewall rule to allow access to the unencrypted port only for internal clients and expose the encrypted port to the outside clients. This approach comes with minimum operational and performance overhead. The feature has been introduced to Apache Hadoop under HDFS-13541.
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
How did Maginatics build a strongly consistent and secure distributed file system? Niraj Tolia, Chief Architect at Maginatics, gave this presentation on the design of MagFS at the Storage Developer Conference on September 16, 2013.
For more information about MagFS—The File System for the Cloud, visit maginatics.com or contact us directly at info@maginatics.com.
Building a Just-in-Time Application Stack for AnalystsAvere Systems
Slide presentation from Webinar on February 17, 2016.
People in analytical roles are demanding more and more compute and storage to get their jobs done. Instead of building out infrastructure for a few employees or a department, systems engineers and IT managers can find value in creating a compute stack in the cloud to meet the fluctuating demand of their clients.
In this 45-minute webinar, you’ll learn:
- How to identify the right analytical workloads
- How to create a scalable compute environment using the cloud for analysts in under 10 minutes
- How to best manage costs associated with the cloud compute stack
- How to create dedicated client stacks with their own scratch space as well as general access to reference data
Health systems departments, research & development departments, and business analyst groups all face silos of these challenging, compute-intensive use cases. By learning how to quickly build this flexible workflow that can be scaled up and down (or off) instantly, you can support business objectives while efficiently managing costs.
The 3.0 release of the Maginatics Cloud Storage Platform (MCSP) includes great improvements in Data Protection, Multi-tier Caching and APIs, as well as other significant new features that make Maginatics the ideal choice for enterprise businesses with demanding storage requirements.
Geek Sync | Infrastructure for the Data Professional: An IntroductionIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: https://www.idera.com/resourcecentral/webcasts/geeksync/infrastructure-for-the-data-professional
It doesn’t matter if you are a DBA, application developer, database developer, or BI pro, the infrastructure your SQL Server environment runs on is important. If you didn’t “grow-up” on the system administration side of IT or, perhaps, you have been out of the operations world long enough to have fallen out of the loop with what is happening. This session is intended to provide a full stack infrastructure overview so that you can talk shop with your cohorts in operations to resolve issues and maybe even be proactive. We will discuss, in an introductory fashion, hardware, network, storage, virtualization and operating system layers. Additionally, some suggestions as to where to find more information will be provided.
Speaker: Peter Shore is a seasoned IT professional with over 25 years of experience. He took the accidentally intentional DBA plunge in 2013 and has discovered that he loves to find the stories the data has to tell. Peter is comfortable working with both physical and virtual servers, where he tries to apply best practices to attain performance improvements. He is also adept at bridging the gap between technical and business language in order to bring technology solutions to business needs.
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
اسلایدهای کارگاه پردازش های موازی با استفاده از زیرساخت جی پی یو GPU
اولین کارگاه ملی رایانش ابری کشور
وحید امیری
vahidamiry.ir
دانشگاه صنعتی امیرکبیر - 1391
Is cloud and NDT a good mix? NDT has its own specificity. Clouds can truly simplify the file management, but is any cloud solution adapted for the NDT? For example, Dropbox may not work right out of the box for our market. This presentation highlights different avenues about clouds (IaaS, PaaS, and SaaS); and highlights NDT critical requirements (constraints and needs). A list of different levels of cloud services (component, option, security, ...) will be defined. It is important to remember that private and public servers are 2 possible avenues. NDT was an early user of private servers even before it was called a cloud. Overall the main idea is to optimize the operation process to reduce OPEX and to increase availability and accuracy of data.
See: www.amotus-solutions.com or www.nubitus.com
With media files blowing up in size, the challenge of moving these files is a very apparent problem that broadcasters are facing today.
This webinar (https://www.youtube.com/watch?v=J5cemDhep4I) discusses the differences between FTP and FileCatalyst technology.
The concept of “cloud computing” has been around since the beginning of the internet. Join Brian Pichman as he goes beyond storing files in the cloud and gets you started with the powerful Google Cloud service. The webinar will demonstrate how to get started with $300 in Google Credits to learn and play. Libraries can use Google Cloud to build websites, host applications, offload processes to save money/resources, or even migrate your environment to theirs. You do not need to be in IT to participate; as we will cover the basics of Cloud Computing and Google Cloud. You will need a Google Account prior to joining the webinar.
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...Amazon Web Services
While the cloud offers many benefits, moving TBs and PBs of data to the cloud can be challenging. Traditional software transfer technologies are slow and unreliable, and shipping physical storage disks is time consuming and exposes data to unnecessary security risks. IBM Aspera offers high-speed data transfer that uses the public internet to securely and reliably migrate large amounts of data from your existing environment to AWS. Learn how IBM Aspera on Cloud can dramatically reduce migration windows, help lower the costs of migration, and eliminate the risks associated with physical disk shipment. This presentation is brought to you by AWS partner, IBM.
Hybrid clouds provide a good balance between the privacy offered by private clouds and the elasticity and reliability of public clouds. The presentation offers an introduction to the decision criteria when switching from a private to a hybrid cloud architecture and where to start from.
Dimension Data Cloud Business Unit - Solution OfferingRifaHaryadi
Dimension Data - Cloud Business Unit Solution Offering. This presentation will take you through Dimension Data Solution Offering and Roadmap to the Future of Cloud Computing. Dimension Data Cloud Computing Solution are fully control by Manage Cloud Platform - Dimension Data Propretiary Orchestration and Automation Tools
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Benchmarking Personal Cloud Storage
1. Benchmarking Personal Cloud
Storage
Spyros Eleftheriadis – itp12401@hua.gr
Harokopio University of Athens
Department of Informatics and Telematics
Postgraduate Program "Informatics and Telematics“
Modern Computer System Architectures
Thursday 16 January 2014
2. Introduction
• Services like Dropbox, SkyDrive and Google
Drive are becoming pervasive in people’s
routine.
• Such applications are data-intensive and their
increasing usage already produces a significant
share of Internet traffic.
• Very little is known about how providers
implement their services and the implications of
different designs.
3. Goals
• How different providers tackle the problem of
synchronizing people’s files?
• Development of a methodology that helps to
understand both system architecture and client
capabilities.
• What are the consequences of each design on
performance?
5. Testbed I
• The testbed is composed of two parts
• (i) a test computer that runs the application-under-test in the desired
operating system
• (ii) the testing application.
• The complete environment can run either in a single machine, or in separate
machines provided that the testing application can intercept traffic from the
test computer.
6. Testbed II
• The actual testbed is a single Linux which both controls the experiments
and hosts a virtual machine that runs the test computer (Windows 7
Enterprise).
• The testing application receives as input benchmarking parameters
describing the sequence of operations to be performed. The testing
application acts remotely on the test computer, generating specific
workloads in the form of file batches, which are manipulated using a FTP
client. Files of different types are created or modified at run-time, e.g.,
text files composed of random words from a dictionary, images with
random pixels, or random binary files.
• Generated files are synchronized to the cloud by the application-undertest and the exchanged traffic is monitored to compute performance
metrics. These include the amount of traffic seen during the experiments,
the time before actual synchronization starts and the time to complete
synchronization.
7. Architecture & Data Centers
To identify how the analyzed services operate, they observe the
DNS names of contacted servers when:
• (i) starting the application
• (ii) immediately after files are manipulated
• (iii) when the application is in idle state
Contacted DNS name > IP address > Hybrid methodology >
Datacenter location estimation with about a hundred
of kilometers of precision
8. Checking Capabilities
Personal cloud storage applications can implement several capabilities
to optimize storage usage and to speed up transfers.
These capabilities include the adoption of
• chunking (i.e., splitting content into a maximum size data unit),
• bundling (i.e., the transmission of multiple small files as a single
object),
• deduplication (i.e., avoiding re-transmitting content already available
on servers),
• delta encoding (i.e., transmission of only modified portions of a file)
and
• compression.
For each case, a specific test has been designed to observe if the given
capability is implemented.
9. Benchmarking Performance
After knowing how the services are designed in terms of both data center
locations and system capabilities, we check how such choices influence
synchronization performance and the amount of overhead traffic.
They designed 8 benchmarks varying:
• number of files
• file sizes
• file types
• All files in the sets are created at run-time by our testing application.
• Each experiment is repeated 24 times per service, allowing at least 5 min
between experiments to avoid creating abnormal workloads to servers.
• The benchmark of a single storage service lasts for about 1 day.
10. Tested Storage Services
• Dropbox, Google Drive and SkyDrive are selected
because they are among the most popular offers
(according to the volume of search queries containing
names of cloud storage services on Google Trends).
• Wuala is considered because it is a system that offers
encryption at the client-side. (They want to verify the
impact of such privacy layer on synchronization
performance).
• Amazon Cloud Drive included to compare its
performance to Dropbox, since both services rely on
Amazon Web Services (AWS) data centers.
12. Protocols I
• All clients use HTTPS, except Dropbox
notification protocol, which relies on plain
HTTP. Interestingly, some Wuala storage
operations also use HTTP, since users’ privacy
has already been secured by local encryption.
• All services but Wuala use separate servers for
control and storage.
13. Protocols II
Firstly, the applications authenticate the user and check if any content has to be
updated.
• SkyDrive requires about 150 kB in total, 4 times more than others. This happens
because the application contacts many Microsoft Live servers during login (13 in this
example).
Secondly, once login is completed, the applications keep exchanging data with the
cloud.
• Wuala is polling servers every 5 min on average (equivalent background traffic of about
60 b/s).
• Google Drive follows close, with a lightweight 40 s polling interval (42 b/s).
• Dropbox and SkyDrive use intervals close to 1 min (82 b/s and 32 b/s, respectively).
• Amazon Cloud Drive is polling servers every 15 s, each time opening a new HTTPS
connection. This notification strategy consumes 6 kb/s – i.e., about 65 MB per day. This
information is relevant to users with bandwidth constraints (e.g., in 3G/4G networks)
and to the system: 1 million users would generate approximately 6 Gb/s of
signaling traffic alone! As the results for other providers demonstrate,
such design is not optimal and seems indeed
possible to be improved.
15. Data Centers
• Dropbox uses own servers (in the San Jose area) for client management,
while storage servers are committed to Amazon in Northern Virginia.
• Cloud Drive uses three AWS data centers: two are used for both storage
and control (in Ireland and Northern Virginia); a third one is used for storage
only (in Oregon).
• SkyDrive relies on Microsoft’s data centers in the Seattle area (for storage)
and Southern Virginia (for storage and control). We also identified a
destination in Singapore (for control only).
• Wuala data centers are located in Europe: two in the Nuremberg area, one
in Zurich and a fourth in Northern France. None is owned by Wuala.
• Google Drive follows a different approach: TCP connections are terminated
at the closest Google’s edge node, from where the traffic is routed to the
actual storage/control data center using the private Google’s network.
18. Chunking
• Only Amazon Cloud Drive does not perform chunking
• Google Drive uses 8 MB chunks
• Dropbox uses 4 MB chunks
• SkyDrive and Wuala use variable chunk sizes
19. Bundling
• Only Dropbox implements a file-bundling strategy
• Google Drive and Amazon Cloud Drive open one
separate TCP (and SSL) connection for each file.
• SkyDrive and Wuala submit files sequentially, waiting
for application layer acknowledgments between each
upload.
20. Client-side Deduplication
• Only Dropbox and Wuala implement deduplication.
• All other services have to upload the same data
even if it is readily available at the storage server.
Interestingly, Dropbox and Wuala can identify
copies of users’ files even after they are deleted
and later restored.
• In the case of Wuala, deduplication is compatible
with local encryption, i.e., two identical files
generate two identical encrypted versions.
21. Delta Encoding
• Delta encoding is a specialized compression
technique that calculates file differences among
two copies, allowing the transmission of only the
modifications between revisions.
• Only Dropbox fully implements delta encoding
• Wuala does not implement delta encoding.
However, deduplication prevents the client from
uploading those chunks not affected by the
change.
23. Compression
• Compression reduce traffic and storage
requirements at the expense of processing time.
• Dropbox and Google Drive compress data before
transmission.
• Google Drive implements smart policies to verify
the file format.
25. Synchronization Startup
• Dropbox is the fastest service to start
synchronizing single files. Its bundling strategy,
however, slightly delays start up with multiple files.
As we will show next, such strategy pays back in
total upload time.
• Wuala also increases its startup time when
multiple files are submitted.
• SkyDrive is by far
the slowest.
26. Completion Time
• Google Drive (26,49 Mb/s) and Wuala (33,34 Mb/s)
are the fastest, since each TCP connection is
terminated at data centers nearby our testbed.
• Dropbox and SkyDrive, on the other hand, are the
most impacted services.
27. Protocol Overhead
• Protocol overhead is the total storage and control
traffic over the benchmark size.
• Amazom Cloud Drive presents a very high
overhead because of its high number of control
flows opened for every file transfer.
• Dropbox exhibits the highest
overhead among the
remaining services, possibly
owing to the signaling cost
of implementing its
advanced capabilities.
28. Conclusions
• Dropbox implements most of the checked capabilities, and its
sophisticated client clearly boosts performance.
• Wuala deploys client side encryption, and this feature does
not seem to affect Wuala synchronization performance.
• These 4 examples confirm the role played by data center
placement in a centralized approach taking the perspective
of European users only, network latency is still an important
limitation for U.S. centric services, such as Dropbox and
SkyDrive. Services deploying data centers nearby our test
location, such as Wuala and Google Drive.