Expert Roundtable: The Future of Metadata After Hive MetastorelakeFS
The expert roundtable "The Future of Metadata After Hive Metastore" took place on November 10, 2021.
Panelists:
- Ryan Blue
- Seshu Adunuthula
- Lior Ebel
- Oz Katz
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
RGW S3: Features vs deep compatibility - Robin JohnsonCeph Community
This document discusses the differences between the S3 API specification, Amazon S3 implementation, and the RGW S3 implementation in Ceph. It notes that while RGW aims to be compatible with S3, there are subtle differences in features and behaviors between the three. The document analyzes specific features like Content-Length handling and regions to demonstrate differences. It also discusses challenges around compatibility, testing, and impacts of missing features in RGW.
In this presentation from June 7, 2018, Globus Head of Products Rachana Ananthakrishnan discussed new Globus Connect Server features including support for HTTP/S, multiple storage connector support, new terminology used in the 5.0 product release series, and more. Listen to the live recording here: https://youtu.be/Ubu0KhIbIA0
What's new in Silverstripe 4? (StripeCon APAC 2016)Ingo Schommer
SilverStripe 4 introduces updates to assets, campaigns, and technologies like React. It aims to modernize content publishing, improve the authoring experience, and integrate better with the PHP ecosystem. Key changes include an upgraded file system, support for campaigns and versioned content, and redesigned areas like files and forms using technologies like React. The alpha releases are underway with a beta planned for early 2017 and a stable release in the second quarter of 2017.
Ceph Management and Monitoring with Dashboard v2 - Lenz GrimmerCeph Community
This document discusses the history and development of the Ceph Dashboard tool. It describes the limitations of the original Dashboard v1 and the goals for the new Dashboard v2, which uses an Angular frontend and modular Python backend. Dashboard v2 aims to provide full management and monitoring capabilities for Ceph clusters in a web UI, addressing the limitations of the previous version. The document demonstrates Dashboard v2 and outlines next steps to add additional management features.
Expert Roundtable: The Future of Metadata After Hive MetastorelakeFS
The expert roundtable "The Future of Metadata After Hive Metastore" took place on November 10, 2021.
Panelists:
- Ryan Blue
- Seshu Adunuthula
- Lior Ebel
- Oz Katz
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services. Presentation by Steve Tuecke from The University of Chicago. Steve is Globus Founder and Project Lead.
RGW S3: Features vs deep compatibility - Robin JohnsonCeph Community
This document discusses the differences between the S3 API specification, Amazon S3 implementation, and the RGW S3 implementation in Ceph. It notes that while RGW aims to be compatible with S3, there are subtle differences in features and behaviors between the three. The document analyzes specific features like Content-Length handling and regions to demonstrate differences. It also discusses challenges around compatibility, testing, and impacts of missing features in RGW.
In this presentation from June 7, 2018, Globus Head of Products Rachana Ananthakrishnan discussed new Globus Connect Server features including support for HTTP/S, multiple storage connector support, new terminology used in the 5.0 product release series, and more. Listen to the live recording here: https://youtu.be/Ubu0KhIbIA0
What's new in Silverstripe 4? (StripeCon APAC 2016)Ingo Schommer
SilverStripe 4 introduces updates to assets, campaigns, and technologies like React. It aims to modernize content publishing, improve the authoring experience, and integrate better with the PHP ecosystem. Key changes include an upgraded file system, support for campaigns and versioned content, and redesigned areas like files and forms using technologies like React. The alpha releases are underway with a beta planned for early 2017 and a stable release in the second quarter of 2017.
Ceph Management and Monitoring with Dashboard v2 - Lenz GrimmerCeph Community
This document discusses the history and development of the Ceph Dashboard tool. It describes the limitations of the original Dashboard v1 and the goals for the new Dashboard v2, which uses an Angular frontend and modular Python backend. Dashboard v2 aims to provide full management and monitoring capabilities for Ceph clusters in a web UI, addressing the limitations of the previous version. The document demonstrates Dashboard v2 and outlines next steps to add additional management features.
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminCeph Community
This document provides an overview of object storage concepts and features of Ceph's RADOS Gateway (RGW). Key points include:
- RGW provides a RESTful API for object storage that is compatible with AWS S3 and OpenStack Swift.
- Objects stored in RGW can be large in size and immutable, with permissions set at the object level.
- Multipart uploads allow efficient transfer of large objects by splitting them into parts.
- Versioning and lifecycle policies allow automatic management of object versions and transitions.
- RGW can be used to provide NFS access to the object storage namespace.
Why Software Defined Storage is Critical for Your IT Strategyandreas kuncoro
The document discusses how software-defined storage is critical for IT strategies as data centers evolve. It describes how development models, application architectures, deployment methods, and infrastructure like storage have changed over time. These changes include trends like virtualization, containers, hybrid cloud, and software-defined storage. The document then provides examples of Red Hat's storage portfolio and solutions to address different workload types and industry use cases.
This is the presentation at Percona Live 2015 on MySQL, MariaDB and Percona Orchestration on bare metal, virtualised environments and clouds (AWS and OpenStack).
This document discusses deploying SharePoint 2013 on Microsoft Azure infrastructure as a service (IaaS). It covers key Azure concepts like virtual networks, availability, disks, and virtual machines. Virtual networks allow grouping of virtual machines and enabling Active Directory. High availability is achieved through location, regions, affinity groups, and availability sets. Disk storage and performance considerations for databases and content are provided. Sample virtual machine configurations show optimal disk layout and sizing for SharePoint and SQL Server.
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Ceph Community
This document discusses Flipkart's use of Ceph object storage to provide a petabyte-scale object storage service. Key points:
- Flipkart runs two large data centers hosting over 20,000 servers and 60,000 VMs to power its e-commerce marketplace.
- It developed a highly scalable object storage service using Ceph to store over 1.5 billion objects totaling around 2PB of data. This service provides APIs compatible with AWS S3.
- The Ceph clusters are deployed across SSDs and HDDs to provide different performance and cost tiers for shared active, backup, and archival workloads with different SLAs around latency, throughput, and durability
Approaching a platform transition requires proper planning and execution on top of a serious technical architecture. In this webinar, we’ll discuss the migration of NYSenate.gov from three unique perspectives: the lead developer, the project manager and the platform provider.
Join Brad MacDonald and Derek Reese of Mediacurrent and Erik Mathy of Pantheon for the in-depth technical notes of this project and a discussion on the future of democratic development.
A First Look at HPCC Systems 7.0, Innovation in ActionHPCC Systems
As part of the 2018 HPCC Systems Summit Community Day event:
The latest version of the platform contains improvements to functionality, usability and interoperability. This talk gives an overview of the changes and explains how you might find them useful.
Gavin Halliday's primary focus is on the code generator, which converts ECL into the queries which run on the platform. Gavin enjoys working on problems together with the development team and the varied nature of the work keeps him engaged. Gavin shares how the platform compares with competitive platforms, including scalability and coding simplicity. He enjoys working on the platform and the elegant solutions the development team is able to implement. Gavin encourages people to give it a try!
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebula Project
Hyperconvergence is one of the big topics in datacenters at the moment. But is it more than an old wine in new bottles? Why we at Runtastic built an hyperconverged datacenter based on Opennebula with Ceph and what we learned.
YouTube: https://youtu.be/50Z4bmevTpg
The Evolution of Open Source DatabasesIvan Zoratti
The document provides an overview of the evolution of open source databases from the past to present and future. It discusses the early days of navigational and hierarchical databases. It then covers the development of relational databases and SQL. It outlines the rise of open source databases like MySQL, PostgreSQL, and SQLite. It also summarizes the emergence of NoSQL databases and NewSQL systems to handle big data and cloud computing. The document predicts continued development and blending of features between SQL, NoSQL, and NewSQL databases.
Brisbane Drupal meetup - 2016 Jan - Drupal hostingsVladimir Roudakov
The document discusses various Drupal hosting options including Acquia, Pantheon, and Platform.sh. It notes the benefits of using Drupal hosting include quick staging environments, Drupal-specific applications and configurations, and hosting and support from those with Drupal expertise. Cons can include higher costs compared to shared hosting and requiring development knowledge. The document also provides updates on upcoming Drupal events in Australia and Asia/Pacific including DrupalSouth, DrupalCon Asia and DrupalCon US.
This document compares traditional monolithic applications to microservices applications. Traditional applications have most functionality within a few processes separated by layers and libraries, while microservices segregate functionality into separate, independently deployable services. Traditional applications scale by cloning the entire app, while microservices can scale services individually. Microservices use a graph of interconnected services with distributed data ownership, while traditional apps typically use a single shared database.
IPFS is a distribution protocol that enables the creation of completely distributed applications through content addressing. A very ambitious open source project in Go, IPFS adopts a peer-to-peer hypermedia protocol to protect against a single point of failure. This presentation aims to highlight the design and ideas of IPFS and also touches upon a real world use case.
SQLite is an open source, public domain embedded SQL database that is zero-configuration, self-contained, and supports transactions that remain intact after system crashes or power failures. It implements most of SQL92 and stores the entire database in a single disk file, supporting databases up to terabytes in size. SQLite is faster than popular client/server databases for most common operations and has a simple API with bindings available for dozens of languages.
This document provides an overview of Kubernetes and how Nirmata can help enterprises manage Kubernetes clusters and workloads. It begins with basic Kubernetes concepts like pods, deployments, services, and networking. It then discusses how Nirmata provides centralized management of Kubernetes infrastructure and applications across public and private clouds through its policy engine and integration with DevOps tools. The document concludes by stating that Kubernetes enables enterprise agility when managed with solutions like Nirmata.
Discover content tokens, variant rules, data source advanced syntax and scaffolding. This deck was presented during SUGPL meeting at Cognifide office in Poznań on October 6th, 2017.
Presentation from OpenCms Days 2014.
Running OpenCms within a private cloud-environment using the features of dynamic up- and downscaling is a challenge.
Our expandable service architecture around OpenCms allows us to run the whole system without noticeable outages.
We demonstrate how we have overcome the tight coupling of OCEE to achieve this goal and show which approaches we pursue to get OpenCms enterprise- & cloud-ready.
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...OpenNebula Project
Virtion provides managed hosting and IT services using an infrastructure as a service platform based on OpenNebula. The platform utilizes multiple server clusters across two datacenter locations plus external clouds for scaling. OpenNebula has been in use since 2011 to manage all IaaS resources, both for Virtion's hosting customers and for managing customer on-premise locations. Block storage is provided by Storpool which has been in production since 2016 and provides high performance storage across two clusters located in separate fire zones. Future plans include upgrading OpenNebula, taking new Storpool features into production like cross-zone replication, expanding container services, and automating monitoring.
Directories for the REST of Us: REST to LDAP in OpenDJ 2.6ForgeRock
A Hands-On Workshop session with OpenDJ Product Manager Ludovic Poitou, and OpenDJ Architect Matt Swift.
Learn more about ForgeRock Access Management:
https://www.forgerock.com/platform/access-management/
Learn more about ForgeRock Identity Management:
https://www.forgerock.com/platform/identity-management/
This talk gives a quick intro what’s come to be known as software-defined data center. Its enablers are recent hardware trends combined with advances in software technology that together allow creating an infrastructure that makes life a lot easier for operations.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
John Readey presented on HDF5 in the cloud using HDFCloud. HDF5 can provide a cost-effective cloud infrastructure by paying for what is used rather than what may be needed. HDFCloud uses an HDF5 server to enable accessing HDF5 data through a REST API, allowing users to access large datasets without downloading entire files. It maps HDF5 objects to cloud object storage for scalable performance and uses Docker containers for elastic scaling.
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminCeph Community
This document provides an overview of object storage concepts and features of Ceph's RADOS Gateway (RGW). Key points include:
- RGW provides a RESTful API for object storage that is compatible with AWS S3 and OpenStack Swift.
- Objects stored in RGW can be large in size and immutable, with permissions set at the object level.
- Multipart uploads allow efficient transfer of large objects by splitting them into parts.
- Versioning and lifecycle policies allow automatic management of object versions and transitions.
- RGW can be used to provide NFS access to the object storage namespace.
Why Software Defined Storage is Critical for Your IT Strategyandreas kuncoro
The document discusses how software-defined storage is critical for IT strategies as data centers evolve. It describes how development models, application architectures, deployment methods, and infrastructure like storage have changed over time. These changes include trends like virtualization, containers, hybrid cloud, and software-defined storage. The document then provides examples of Red Hat's storage portfolio and solutions to address different workload types and industry use cases.
This is the presentation at Percona Live 2015 on MySQL, MariaDB and Percona Orchestration on bare metal, virtualised environments and clouds (AWS and OpenStack).
This document discusses deploying SharePoint 2013 on Microsoft Azure infrastructure as a service (IaaS). It covers key Azure concepts like virtual networks, availability, disks, and virtual machines. Virtual networks allow grouping of virtual machines and enabling Active Directory. High availability is achieved through location, regions, affinity groups, and availability sets. Disk storage and performance considerations for databases and content are provided. Sample virtual machine configurations show optimal disk layout and sizing for SharePoint and SQL Server.
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Ceph Community
This document discusses Flipkart's use of Ceph object storage to provide a petabyte-scale object storage service. Key points:
- Flipkart runs two large data centers hosting over 20,000 servers and 60,000 VMs to power its e-commerce marketplace.
- It developed a highly scalable object storage service using Ceph to store over 1.5 billion objects totaling around 2PB of data. This service provides APIs compatible with AWS S3.
- The Ceph clusters are deployed across SSDs and HDDs to provide different performance and cost tiers for shared active, backup, and archival workloads with different SLAs around latency, throughput, and durability
Approaching a platform transition requires proper planning and execution on top of a serious technical architecture. In this webinar, we’ll discuss the migration of NYSenate.gov from three unique perspectives: the lead developer, the project manager and the platform provider.
Join Brad MacDonald and Derek Reese of Mediacurrent and Erik Mathy of Pantheon for the in-depth technical notes of this project and a discussion on the future of democratic development.
A First Look at HPCC Systems 7.0, Innovation in ActionHPCC Systems
As part of the 2018 HPCC Systems Summit Community Day event:
The latest version of the platform contains improvements to functionality, usability and interoperability. This talk gives an overview of the changes and explains how you might find them useful.
Gavin Halliday's primary focus is on the code generator, which converts ECL into the queries which run on the platform. Gavin enjoys working on problems together with the development team and the varied nature of the work keeps him engaged. Gavin shares how the platform compares with competitive platforms, including scalability and coding simplicity. He enjoys working on the platform and the elegant solutions the development team is able to implement. Gavin encourages people to give it a try!
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebula Project
Hyperconvergence is one of the big topics in datacenters at the moment. But is it more than an old wine in new bottles? Why we at Runtastic built an hyperconverged datacenter based on Opennebula with Ceph and what we learned.
YouTube: https://youtu.be/50Z4bmevTpg
The Evolution of Open Source DatabasesIvan Zoratti
The document provides an overview of the evolution of open source databases from the past to present and future. It discusses the early days of navigational and hierarchical databases. It then covers the development of relational databases and SQL. It outlines the rise of open source databases like MySQL, PostgreSQL, and SQLite. It also summarizes the emergence of NoSQL databases and NewSQL systems to handle big data and cloud computing. The document predicts continued development and blending of features between SQL, NoSQL, and NewSQL databases.
Brisbane Drupal meetup - 2016 Jan - Drupal hostingsVladimir Roudakov
The document discusses various Drupal hosting options including Acquia, Pantheon, and Platform.sh. It notes the benefits of using Drupal hosting include quick staging environments, Drupal-specific applications and configurations, and hosting and support from those with Drupal expertise. Cons can include higher costs compared to shared hosting and requiring development knowledge. The document also provides updates on upcoming Drupal events in Australia and Asia/Pacific including DrupalSouth, DrupalCon Asia and DrupalCon US.
This document compares traditional monolithic applications to microservices applications. Traditional applications have most functionality within a few processes separated by layers and libraries, while microservices segregate functionality into separate, independently deployable services. Traditional applications scale by cloning the entire app, while microservices can scale services individually. Microservices use a graph of interconnected services with distributed data ownership, while traditional apps typically use a single shared database.
IPFS is a distribution protocol that enables the creation of completely distributed applications through content addressing. A very ambitious open source project in Go, IPFS adopts a peer-to-peer hypermedia protocol to protect against a single point of failure. This presentation aims to highlight the design and ideas of IPFS and also touches upon a real world use case.
SQLite is an open source, public domain embedded SQL database that is zero-configuration, self-contained, and supports transactions that remain intact after system crashes or power failures. It implements most of SQL92 and stores the entire database in a single disk file, supporting databases up to terabytes in size. SQLite is faster than popular client/server databases for most common operations and has a simple API with bindings available for dozens of languages.
This document provides an overview of Kubernetes and how Nirmata can help enterprises manage Kubernetes clusters and workloads. It begins with basic Kubernetes concepts like pods, deployments, services, and networking. It then discusses how Nirmata provides centralized management of Kubernetes infrastructure and applications across public and private clouds through its policy engine and integration with DevOps tools. The document concludes by stating that Kubernetes enables enterprise agility when managed with solutions like Nirmata.
Discover content tokens, variant rules, data source advanced syntax and scaffolding. This deck was presented during SUGPL meeting at Cognifide office in Poznań on October 6th, 2017.
Presentation from OpenCms Days 2014.
Running OpenCms within a private cloud-environment using the features of dynamic up- and downscaling is a challenge.
Our expandable service architecture around OpenCms allows us to run the whole system without noticeable outages.
We demonstrate how we have overcome the tight coupling of OCEE to achieve this goal and show which approaches we pursue to get OpenCms enterprise- & cloud-ready.
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...OpenNebula Project
Virtion provides managed hosting and IT services using an infrastructure as a service platform based on OpenNebula. The platform utilizes multiple server clusters across two datacenter locations plus external clouds for scaling. OpenNebula has been in use since 2011 to manage all IaaS resources, both for Virtion's hosting customers and for managing customer on-premise locations. Block storage is provided by Storpool which has been in production since 2016 and provides high performance storage across two clusters located in separate fire zones. Future plans include upgrading OpenNebula, taking new Storpool features into production like cross-zone replication, expanding container services, and automating monitoring.
Directories for the REST of Us: REST to LDAP in OpenDJ 2.6ForgeRock
A Hands-On Workshop session with OpenDJ Product Manager Ludovic Poitou, and OpenDJ Architect Matt Swift.
Learn more about ForgeRock Access Management:
https://www.forgerock.com/platform/access-management/
Learn more about ForgeRock Identity Management:
https://www.forgerock.com/platform/identity-management/
This talk gives a quick intro what’s come to be known as software-defined data center. Its enablers are recent hardware trends combined with advances in software technology that together allow creating an infrastructure that makes life a lot easier for operations.
"What's New With Globus" Webinar: Spring 2018Globus
In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
John Readey presented on HDF5 in the cloud using HDFCloud. HDF5 can provide a cost-effective cloud infrastructure by paying for what is used rather than what may be needed. HDFCloud uses an HDF5 server to enable accessing HDF5 data through a REST API, allowing users to access large datasets without downloading entire files. It maps HDF5 objects to cloud object storage for scalable performance and uses Docker containers for elastic scaling.
Building Data Portals and Science Gateways with GlobusGlobus
Presented at GlobusWorld 2022 by the Globus professional services team. Describes the Modern Research Data Portal design pattern and an implementation using the Django framework.
HDF Cloud Services aims to bring HDF5 to the cloud by defining a REST API for HDF5 and implementing related services. The HDF REST API allows HDF5 data to be accessed via HTTP requests and responses. H5serv is an open source reference implementation of the HDF REST API. The HDF Scalable Data Service (HSDS) is being developed to support large HDF5 repositories in a scalable, cost effective manner using object storage like AWS S3.
Managing Protected and Controlled Data with Globus Globus
This document discusses how Globus can be used to manage protected and controlled data with high assurance. It describes features for restricting data handling according to standards like NIST 800-53 and 800-171. Compliance focuses on access control, configuration management, maintenance, and accountability. Restricted data passed to Globus does not include file contents. The initial release includes a new web app, Globus Connect Server v5.2, and Connect Personal. High assurance capabilities include additional authentication, application instance isolation, encryption, and detailed auditing. Subscription levels like High Assurance and BAA provide these features.
Kubernetes – An open platform for container orchestrationinovex GmbH
Datum: 30.08.2017
Event: GridKA School 2017
Speaker: Johannes M. Scheuermann
Mehr Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Mehr Tech-Artikel: https://www.inovex.de/blog/
The document discusses adding search capabilities to the Hadoop ecosystem through Cloudera Search. It provides an overview of Cloudera Search's architecture and components, which integrate Apache Solr with Cloudera Distribution of Hadoop to enable distributed, full-text search across data stored in HDFS. Key components described include HDFSDirectory, which allows Solr to read and write indexes and transaction logs to and from HDFS, and BlockDirectoryCache, which caches index file blocks in memory for performance.
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
This document provides an overview of IBM Spectrum Scale's Active File Management (AFM) capabilities and use cases. AFM uses a home-and-cache model to cache data from a home site at local clusters for low-latency access. It expands GPFS' global namespace across geographical distances and provides automated namespace management. The document discusses AFM caching basics, global sharing, use cases like content distribution and disaster recovery. It also provides details on Spectrum Scale's protocol support, unified file and object access, using AFM with object storage, and configuration.
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
Introduction to Spectrum Scale Active File Management (AFM)
and its use cases. Spectrum Scale Protocols - Unified File & Object Access (UFO) Feature Details
AFM + Object : Unique Wan Caching for Object Store
1. Cloudera Search provides full-text search capabilities for Hadoop ecosystems by integrating Apache Solr. It allows batch, near real-time, and on-demand indexing of data in HDFS, HBase, and other data sources.
2. Indexing can be done through various methods like Flume for near real-time indexing, HBase indexer for indexing HBase data, and MapReduce jobs for scalable batch indexing. Extraction and mapping of data is done through the Cloudera Morphlines framework.
3. Queries can be done through the built-in Solr web UI, custom UIs like Hue, or Solr APIs. Security features include Kerberos authentication and
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
This document discusses implementing Hadoop and Elastic MapReduce on Cloudian's scale-out object storage platform. It describes Cloudian's hybrid cloud storage capabilities and how their approach reduces costs and provides faster analytics by analyzing log and event data directly on their storage platform without needing to transform the data for HDFS. Key benefits highlighted include no redundant storage, scaling analytics with storage capacity by adding nodes, and taking advantage of multi-core CPUs for MapReduce tasks.
GlusterFS is an open source scale-out NAS solution. The software is a powerful and flexible solution that simplifies the task of managing unstructured file data whether you have a few terabytes of storage or multiple petabytes. It’s no secret that unstructured data is growing like crazy, Gluster provides a solutions that scales capacity and performance as you need it and is an ideal fit for an IT environment that is increasingly virtualized and moving to the cloud.
There are two key ways that GlusterFS is beneficial for cloud builders:
1. Storage layer for VMs. If you're deploying Xen or KVM VMs on a private cloud, storing them on GlusterFS gives you the ability to migrate to different hypervisors, suspend and resume quickly - even on another hypervisor, scale out far beyond what other filesystems will allow, and utilize N-way replication for DR and HA
2. Unified storage layer for applications. With GlusterFS 3.3, you will be able to access your application data stores from an object (S3, Swift-style) interface, as well as a traditional POSIX-compatible NAS interface. This unified approach gives developers and admins the ability to access the same data store using a variety of different methods.
In this session, attendees will learn steps for deployment and some common use cases.
Speaker Bio
John Mark is an experienced veteran of all things open source and a self-described agitprop, agitator and advocate for those who volunteer countless, unpaid hours for a particular project or community. He first fell down the slippery slope of open source as a web developer at VA Linux Systems and eventually switched to the community team, beginning a career that has now lasted over ten years. Along the way, John Mark made stops at young, up-and-coming startups, such as Groundwork, Hyperic and then Gluster (later acquired by Red Hat). In between, there was a brief interlude at IDG World Expo, where he was the conference director for LinuxWorld, GridWorld and OSBC. His advice for companies who want to "do community" is to trust your community and give them the space to "just try s***." John Mark loves to perform community karaoke, and is available for weddings, funerals and Bar/Bat Mitzvahs
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
In a world of serverless computing users tend to be frugal when it comes to expenditure on compute, storage and other resources. Paying for the same when they aren’t in use becomes a significant factor. Offering Spark as service on cloud presents very unique challenges. Running Spark on Kubernetes presents a lot of challenges especially around storage and persistence. Spark workloads have very unique requirements of Storage for intermediate data, long time persistence, Share file system and requirements become very tight when it same need to be offered as a service for enterprise to mange GDPR and other compliance like ISO 27001 and HIPAA certifications.
This talk covers challenges involved in providing Serverless Spark Clusters share the specific issues one can encounter when running large Kubernetes clusters in production especially covering the scenarios related to persistence.
This talk will help people using Kubernetes or docker runtime in production and help them understand various storage options available and which is more suitable for running Spark workloads on Kubernetes and what more can be done
This document discusses storage requirements for running Spark workloads on Kubernetes. It recommends using a distributed file system like HDFS or DBFS for distributed storage and emptyDir or NFS for local temp scratch space. Logs can be stored in emptyDir or pushed to object storage. Features that would improve Spark on Kubernetes include image volumes, flexible PV to PVC mappings, encrypted volumes, and clean deletion for compliance. The document provides an overview of Spark, Kubernetes benefits, and typical Spark deployments.
A new model for Docker image distributionDocker, Inc.
This document provides an overview of Docker Registry 2.0, which implements a new API (V2) for distributing Docker images. The key points are:
1) Registry API V1 had problems with performance, security, and implementation; API V2 addresses these with a content-addressable, cryptographically verifiable model.
2) API V2 treats image layers as content-addressable blobs identified by cryptographic digests, allowing for independent verification.
3) Manifests describe the components of an image in a single object, allowing layers to be fetched in parallel for better performance.
4) The implementation of API V2 in Docker Registry 2.0 has improved pull performance and security while
The document discusses GlusterD 2.0, a redesign of the Gluster distributed file system management daemon. Some key points:
- GlusterD 1.0 had scalability and consistency issues that limited it to hundreds of nodes. GlusterD 2.0 was rewritten from scratch in Go for better performance.
- GlusterD 2.0 uses etcd for centralized management and configuration storage. It has REST APIs and plugins for modularity.
- Components include REST interfaces, etcd backend, RPC framework, transaction system, and a flexible volume generator.
- Upgrades from Gluster 3.x to 4.x will be disruptive but provide a migration path. Gluster
This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
Introduction to Globus (GlobusWorld Tour West)Globus
This document introduces Globus, which provides fast and reliable data transfer, sharing, and platform services across different storage systems and resources. It does this through software-as-a-service that uses existing user identities, with the goal of unifying access to data across different tiers like HPC, storage, cloud, and personal resources. Key features include secure data transfers without moving files, access control and sharing capabilities, and tools for building automations and integrating with science gateways. It also discusses options for handling protected data like health information with additional security controls and business agreements.
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Globus
This document summarizes a presentation about the Globus data management platform. It includes an agenda covering an introduction to the Globus Software as a Service and Platform as a Service, automating research data workflows, facilitating collaboration, and building services. There are demonstrations of file transfers, data sharing, publication, and high assurance endpoints. The sustainability model is discussed, with standard and high assurance subscriptions, branded websites, premium storage connectors, and identity providers. Support resources like documentation, email lists, and professional services are also mentioned.
Solr + Hadoop: Interactive Search for Hadoopgregchanan
This document discusses Cloudera Search, which integrates Apache Solr with Cloudera's distribution of Apache Hadoop (CDH) to provide interactive search capabilities. It describes the architecture of Cloudera Search, including components like Solr, SolrCloud, and Morphlines for extraction and transformation. Methods for indexing data in real-time using Flume or batch using MapReduce are presented. The document also covers querying, security features like Kerberos authentication and collection-level authorization using Sentry, and concludes by describing how to obtain Cloudera Search.
Similar to Updating the Globus Connect Architecture - ARCC Workshop at PEARC17 (20)
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17
1. The new and improved
Globus Connect architecture
Steve Tuecke
tuecke@globus.org
2. Globus Connect Server
• Enable your storage as part of Globus ecosystem
• Multi-user server, installed and managed by
administrator
• Key capabilities:
– Data movement using GridFTP
– Security using MyProxy and OAuth
– Storage connectors to interface with various file systems
2
docs.globus.org/globus-connect-server-installation-guide/
3. Storage connectors
• Standard storage connectors (Posix)
– Linux, Windows, MacOS
– Lustre, GPFS, OrangeFS, etc.
• Premium storage connectors
3
docs.globus.org/premium-storage-connectors
AWS S3
Ceph RadosGW (S3 API)
Spectra Logic BlackPearl
Google Drive
HPSS
HDFS (in progress)
iRODS (in progress)
HGST Active Archive (in progress)
4. Motivations for Globus Connect Server v5
• Facilitate automation of installation and upgrades
• Allow scale out deployment
– Across DTNs
– Across multiple file system connectors
• Reduce number of ports required
• Streamline user experience with use of Globus sharing
• Enhance user registration of credentials for cloud
storage connectors
• Prepare foundation for next set of enhanced capabilities
4
6. Collection properties
• OAuth2 authentication and
authorization via Globus Auth
• Collection-specific access
policies
• Data is stored on a storage
system, which determines
storage policies such as
durability and availability
• File change events
• Set of blobs (files),
hierarchically named (folders)
• Rooted at a unique DNS name
• URL referenceable files, folders
• Accessible and manageable via:
– HTTPS: client/server file access
– GridFTP: async bulk transfer
– REST API: advanced operations
7. New features with v5
• Collection model
• HTTPS access to storage
• Security improvements
– OAuth2 in GridFTP (no more X.509 user certificates or Myproxy!)
– OpenID Connect identity provider
– Credential expiration LoA policies
– User credential management (e.g., for Google Drive, S3, Kerberos)
• Kerberos protected file systems
• Directory listing with path expressions
8. Installation & configuration enhancements for v5
• Setup with any identity (GlobusID not required)
• Automatable installation and configuration
• Configuration API, CLI, GUI
• Scale-out deployment without shared file system
• Backup / restore configuration to / from the cloud
• Multiple storage systems simultaneously
• Single port GridFTP (no ephemeral ports)
• Distributed as Docker containers
9. Streamlined data sharing with v5
• Remove friction of sharing
– Guest collections where possible, e.g., Google Drive
– Hybrid collections: Mapped access to home & project folders,
else guest access
• Enhanced sharing permissions
– permission expiration
– permissions on files (not just folders)
– sharing via URL possession
• Storage connectors: share from anywhere
10. New capabilities built on collections and v5
• Data search
– With access control
– Schema agnostic
– Custom indexes domain specific
• Event driven actions for automation
– Replication of data (across storage tiers)
– Metadata extraction and ingest to search
– Run analysis pipelines
10
11. Release plans for v5
• Series of point releases with added capabilities
– v5.0 released in April
o Google Drive connector support
o Federated identity for install (no Globus ID required)
• Separate installation from the current Globus
Connect v4
• Migration tools for v4 to v5 will be provided
11
12. Globus Connect Server v5.1 (planned)
• Support multiple connectors in single installation
– POSIX and Google Drive connector
• HTTP/S access
– To data on any connected storage system
• Globus Connect Server Manager service
– Some capabilities towards automation of installation
• Single port for control channel (443)
– Ephemeral ports for data required
12
Editor's Notes
Need to make the point here that the friction is the creation of another endpoint - so get to a point where they can just share from a folder.