Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
Since 1962, ICPSR has been an integral part of the infrastructure of social science research with its vast digital archive supporting over 700 member institutions worldwide. With the release of our new digital assets management system “Archonnex,” ICPSR continues this tradition by extending our expertise and digital technology capabilities as a service to the larger community. For the first time researchers, institutions, organizations, and even nations will be able to host their own repositories and setup data services for their members. We call it RaaS - Repository as a Service.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Hierarchical Cluster Engine (HCE) project
The main idea of this new project – to implement the solution that can be used to: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results (aggregation, duplicates elimination, sorting and so on), internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language…
A description of how visualfabriq's bquery-based reporting & (advanced) analytics environment works. It completely consists of docker-based micro-services, with each component being horizontally scalable.
We are aiming to open source this by Q4 2015.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
Since 1962, ICPSR has been an integral part of the infrastructure of social science research with its vast digital archive supporting over 700 member institutions worldwide. With the release of our new digital assets management system “Archonnex,” ICPSR continues this tradition by extending our expertise and digital technology capabilities as a service to the larger community. For the first time researchers, institutions, organizations, and even nations will be able to host their own repositories and setup data services for their members. We call it RaaS - Repository as a Service.
This presentation was given at the GlobusWorld 2020 Virtual Conference, by Ian Foster, Rachana Ananthakrishnan, and Vas Vasiliadis from the University of Chicago.
Hierarchical Cluster Engine (HCE) project
The main idea of this new project – to implement the solution that can be used to: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results (aggregation, duplicates elimination, sorting and so on), internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language…
A description of how visualfabriq's bquery-based reporting & (advanced) analytics environment works. It completely consists of docker-based micro-services, with each component being horizontally scalable.
We are aiming to open source this by Q4 2015.
This presentation covers both the Cloud Foundry Elastic Runtime (known by many as just "Cloud Foundry") as well as the Operations Manager (known by many as BOSH). For each, the main components are covered with interactions between them.
Discover MongoDB Atlas and MongoDB Stitch - DEM02-S - Mexico City AWS SummitAmazon Web Services
Learn about the modernization of application development using the MongoDB platform on AWS. In this session, discover key capabilities of MongoDB Atlas for on-demand cluster deployment, high availability, horizontal scalability, and geographically distributed operations. Additionally, learn how to quickly build a website or mobile application that is backed by MongoDB and that uses the MongoDB Stitch serverless platform.
This slide deck explores WSO2 Stream Processor’s new features and improvements and explain how they make an organization excel in the current competitive marketplace.
Organizational success depends on our ability to sense the environment, grab opportunities and eliminate threats that are present in real-time. Such real-time processing is now available to all organizations (with or without a big data background) through the new WSO2 Stream Processor.
This slides presents WSO2 Stream Processor’s new features and improvements and explains how they make an organization excel in the current competitive marketplace. Some key features we will consider are:
* WSO2 Stream Processor’s highly productive developer environment, with graphical drag-and-drop, and the Streaming SQL query editor
* The ability to process real-time queries that span from seconds to years
* Its interactive visualization and dashboarding features with improved widget generation
* Its ability to processing at scale via distributed deployments with full observability
* Default support for HTTP analytics, distributed message trace analytics, and Twitter analytics
Containers as Infrastructure for New Gen AppsKhalid Ahmed
Khalid will share on emerging container technologies and their role in supporting an agile cloud-native application development model. He will discuss the basics of containers compared to traditional virtualization, review use cases, and explore the open-source container management ecosystem.
Sitecore 7.5 xDB oh(No)SQL - Where is the data at?Pieter Brinkman
This presentation will give you an introduction into Sitecore 7.5 (xDB) and insights of the new architecture introduced to optimize performance and scalability. This architecture overview includes the services, scalability, dataflow and the different components within Sitecore experience database.
A world's one of the first complete Online Web-based Development Frameworks to develop and deploy Decision Support Systems, Knowledge-based systems, Web-sites and Applications backed by Expert System, Case-Based Reasoning and Hybrid AI Technologies
This is a summary of the technical architecture solution for the PBOCS Workforce management application. CSM-DTC was tasked with designing and implementing the SDLC environment.
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
HAWQ, developed at Pivotal, is a massively parallel processing SQL engine sitting on top of HDFS. As a hybrid of MPP database and Hadoop, it inherits the merits from both parties. It adopts a layered architecture and relies on the distributed file system for data replication and fault tolerance. In addition, it is standard SQL compliant, and unlike other SQL engines on Hadoop, it is fully transactional. This paper presents the novel design of HAWQ, including query processing, the scalable software interconnect based on UDP protocol, transaction management, fault tolerance, read optimized storage, the extensible framework for supporting various popular Hadoop based data stores and formats, and various optimization choices we considered to enhance the query performance. The extensive performance study shows that HAWQ is about 40x faster than Stinger, which is reported 35x-45x faster than the original Hive.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
2. Introduction
The HCE-DC is a multipurpose high productivity scalable and
extensible engine of web data collecting and processing.
Built on several HCE project's products and technologies:
●
The Distributed Tasks Manager (DTM) service.
●
The hce-node network cluster application.
●
The API bindings for Python and PHP languages.
●
The tools and library of crawling algorithms.
●
The tools and library of scraping algorithms.
●
Web administration console.
●
The real-time REST API.
provides flexible configuration and deployment automation to get
installation closed to a target project and easy integration.
3. ●
Crawling - scan the web sites, analyze and parse pages, detect
and collect URLs links and web resources' data. Download
resources from web-servers using collected or provided with
request URL(s) and store them in local raw data file storage.
Everything on multi-host and multi-process architecture
●
Process web page contents with several customizable applied
algorithms like a unstructured textual content scraping and store
results in local sql db storage on multi-host and multi-process
architecture.
●
Manage tasks of crawling and processing with scheduling and
balancing using tasks management service of multi-host
architecture or real-time multi-threaded load-balancing
client-server architecture with multi-host backend engine.
The main functional purposes
4. The extended functionality
●
Developer's API – full access for configuration, deployment,
monitoring and management processes and data.
●
Applied API – full featured multi-thread multi-host REST http-based
protocol to perform crawling and scraping batch requests.
●
Web administration console - the DC and DTM services and
user's accounts, roles, permissions, crawling, scraping, results
collect, aggregation and convert management, statistical data
collect and visualize, notification, triggering and another utility
tools.
●
Helper tools and libraries – several support applied utility.
5. Distributed asynchronous nature
The HCE-DC engine itself is an architecturally fully distributed system. It
can be deployed and configured as single- and multi-host installation.
Key features and properties of distributed architecture:
●
No central database or data storage for crawling and
processing. Each physical host unit with the same structures
shards data but represented as single service.
●
Crawling and processing goes on several physical hosts parallel
multi-process way including downloading, fetching, DOM
parsing, URLs collecting, fields extracting, post processing,
metric calculations and so on tasks.
●
Customizable strategies of data sharding and requests
balancing with minimization of data redundancy and
optimization of resources usage.
●
Reducing of pages and scraped contents internally with smart
merging avoiding of resources duplicates in fetch data client
response.
6. Flexible balancing and scalability
The HCE-DC as service can be deployed on set of physical hosts.
The number of hosts depends on their hardware productivity rate
(CPU cores number, disk space size, network interface speed and so on) and
can be from one to tens or more. Key scalability principles are:
– The hardware computational unit is physical or logical
host (any kind of virtualization and containers supported).
– The hardware units can be added in to the system and
gradually filled with data during regular crawling iterations
at run-time. No dedicated data migration.
– Computational tasks balancing is resource usage
optimized. Tasks scheduler selects computational unit
with maximum free system resources using customizable
estimation formula. Different system resources usage
indicators available: CPU, RAM, DISK, IO wait,
processes number, threads number and so on.
7. Extensible software and algorithms
The HCE-DC service for the Linux OS platform has three main
parts:
●
The core service daemon modules with functionality of:
scheduler of crawling tasks, manager of tasks queues,
managers of periodical processes, manager of
computational units data, manager of storage resources
aging, manager of real-time API requests and so on.
Typically the core daemon process runs on dedicated
host and represents service itself.
●
The computational unit modules set including crawling
crawler-task, scraping algorithms processor-task and
several scraper modules, storage management db-task,
pre-rocessor, finalizer modules and several additional
helper utilities. They acts on session-based principles
and exits after the input batch data set processed.
8. Open processing architecture
The computational modules set can be extended with any kind of
algorithms, libraries and frameworks for any platforms and
programming languages. The limitation is only the API interaction
translation that typically needs some adapters. The key principles
are:
●
Data processing modules involved as native OS
processes or via API including REST and CLI.
●
Process instances are isolated.
●
POSIX CLI API is default for inter-process data exchange
or simulated by converter utilities.
●
Open input/output protocol used to process batch in
sequential way step by step by each processing chain.
●
Streaming open data formats can be easily serialized –
json, xml and so on.
12. Brief list of main DC features
Fully automated distributed web sites crawling with: set of root URLs, periodic re-crawling,
HTTP and HTML redirects, http timeout, dynamic HTML rendering, prioritization, limits
(size, pages, contents, errors, URLs, redirects, content types), requests delaying,
robots.txt, rotating proxies, RSS (1,2,RDF,Atom), scan depth, filters, page chains and
batching.
Fully automated distributed data processing: News ™ article (pre-defined sequential
scraping and extractors usage – Goose, Newspaper and Scrapy) and Template ™
universal (definitions of tags and rules to extract data from pages based on xpath and
csspath, content parts joining, merging, best result selection, regular expressions post
processing, multi-item pages (product, search results, etc), multi-rule, multi-template)
scraping engines, WYSIWYG templates editor, processed contents selection and
merging and extensible processing modules architecture.
Fully automated resources data management: periodic operations, data aging, update,
re-crawling and re-processing.
Web administration console: full CRUD of projects for data collect and process with set of
parameters per project, users with roles and permissions ACL, DC and DTM service's
statistics, crawling and processing project's statistics.
Web REST API gateway: synchronous HTTP REST requests with batching to crawl and to
process, full featured parameters set with additionally limitations per user account and
authorization state.
Real-time requests API: native CLI client, asynchronous and synchronous REST requests.
13. Statistics of three physical hosts
installation for one month
●
Projects: 8
●
Pages crawled: 6.2M
●
Crawling batches: 60K
●
Processing batches: 90K
●
Purging batches: 16K
●
Aging batches: 16K
●
Projects re-crawlings: 30K
●
CPU Load Average: 0.45 avg / 3.5 max
●
CPU utilization: 3% avg / 30% max
●
I/O wait time: 0.31 avg / 6.4 max
●
Network connections: 250 avg / 747 max
●
Network traffic: 152Kbps avg / 5.5Mbps max
●
Data hosts: 2
●
Load-balancing of system OS
resources linear managed
CPU load average, I/O wait
and RAM usage without
excesses and overloads.
●
Linear scalability of real-time
requests per physical host.
●
Linear scalability of automated
crawling, processing and
aging per physical host.