Using cassandra as a distributed logging to store pb dataRamesh Veeramani
This document discusses using Cassandra for big data event logging. It notes that Cassandra scales incrementally, is highly available, and is well suited for OLTP workloads where write throughput is prioritized over reads. It covers Cassandra's internal workings including token assignment, replication, and compaction strategies. Setup instructions are provided along with benchmarking results. Maintenance tools like Nodetool and stress testing tools are also mentioned. The document concludes that Cassandra is a good candidate for logging systems due to its scalability and ease of adding nodes.
HaaS: HPCC Systems as a Service – BYOD to the Cloud PartyHPCC Systems
From the 2017 HPCC Systems Community Day:
Amazon Web Services (AWS) is the premier IaaS provider. It leads the pack by offering more and better services at lower prices. Furthermore, AWS continuously improves and innovates to stay in front. There are numerous reasons to use an IaaS for HPCC Systems instead of dedicated hardware, especially if the workload does not execute 24/7.
AWS has developed several features and tools for launching clusters. CloudFormation provides users a tool to make creating and managing an AWS resources much easier. Foremost it consists of a template (CFT) that defines resources required. The template is parameterizable and flexible so that a single CFT can launch an HPCC Systems cluster with an arbitrary number of nodes, various amount of memory per node, and other configuration options. Second, an Amazon Machine Image (AMI) contains the information needed to launch a compute node, with appropriate software, and configure it for a specific operation. We developed a CFT and an AMI for HPCC Systems.
Additionally, we developed a reference architecture for HPCC Systems in AWS. It is a typical N+1 cluster, N worker nodes and one node (or mode) for cluster wide services such as Dali. The architecture also has storage (i.e., EBS volumes) and networking (i.e., VPN) resources. Significant effort was expended to determine the best set of resources for HPCC Systems clusters.
Furthermore, we created a program to create and manage HPCC Systems clusters in AWS from the command line. This talk will present the tools we created. It also explains and justifies the reference architecture and many of the configuration options.
Vince Freeh
Associate Professor, North Carolina State University
Vincent W. Freeh is an associate professor of computer science at North Carolina State University. He received his Ph.D. in 1996 from the University of Arizona. His research focus is high-performance system software, with emphasis on filesystems, parallel and distributed systems, power-aware computing, and storage systems. Prof Freeh teaches courses in the above research areas as well as in compilers. He has more than 55 referred publications in numerous computer science conferences and scientific journals. He received an NSF CAREER Award and several IBM Faculty Development Awards. He was a captain in the US Army Corps of Engineers before entering graduate school for his MS.
Chin-Jung Hsu
PhD Student, North Carolina State University
Chin-Jung Hsu is a Ph.D. candidate in Computer Science at North Carolina State University. His primary research interests include distributed systems, storage systems, and performance optimization. He interned at NetApp and AT&T Research Lab, where he applied machine learning techniques to distributed storage systems for ensuring performance guarantees. Chin-Jung is currently working on how to efficiently run HPCC Systems applications on the public clouds such as AWS and Azure.
The document provides an introduction to Azure DocumentDB, a fully managed NoSQL database service. It discusses key features like schema-free JSON documents, automatic indexing, and the ability to run JavaScript code directly in the database using stored procedures. It also covers how to configure an DocumentDB account, create databases and collections, perform CRUD operations on documents, and write simple stored procedures. The presentation aims to explain the basics of DocumentDB and demonstrates how to interact with it programmatically.
The document discusses options for analyzing semi-structured event data at Coursera. It considers Hive, Pig, and Scalding. Scalding uses Scala and allows joining different data sources and expressing multiple map-reduce jobs in a succinct way. However, it requires learning Scala. An example shows loading event, course, and topic data and joining them to analyze relationships between the data.
FITC presents: Mobile & offline data synchronization in Angular JSFITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
OVERVIEW
Are you building mobile or web applications with AngularJS and wish they would work when you were offline? You can read, send and delete mail from your mobile email client when you are offline, why not from your AngularJS app? AngularJS is completely agnostic when it comes to creating your data models. Let’s explore what is required to allow your application to be useful to your users even without an internet connection.
INTENDED AUDIENCE - BEGINNER - INTERMEDIATE
This presentation is for developers that know they are looking for offline and data synchronization capabilities. Or, possibly for managers that wish to have a greater understanding of what their options are in AngularJS to create such functionality.
Daniel Zen, CEO, Zen Digital
Daniel Zen is the CEO of Zen Digital, founder of the New York AngularJS Meetup, a frequent lecturer, and a former consultant for Google, Pivotal Labs and various Fortune 500 companies. Zen Digital uses Agile techniques to move projects forward while continuously integrating new code and ideas, producing elegant frontend experiences and efficient backend systems for web and mobile applications.
Using cassandra as a distributed logging to store pb dataRamesh Veeramani
This document discusses using Cassandra for big data event logging. It notes that Cassandra scales incrementally, is highly available, and is well suited for OLTP workloads where write throughput is prioritized over reads. It covers Cassandra's internal workings including token assignment, replication, and compaction strategies. Setup instructions are provided along with benchmarking results. Maintenance tools like Nodetool and stress testing tools are also mentioned. The document concludes that Cassandra is a good candidate for logging systems due to its scalability and ease of adding nodes.
HaaS: HPCC Systems as a Service – BYOD to the Cloud PartyHPCC Systems
From the 2017 HPCC Systems Community Day:
Amazon Web Services (AWS) is the premier IaaS provider. It leads the pack by offering more and better services at lower prices. Furthermore, AWS continuously improves and innovates to stay in front. There are numerous reasons to use an IaaS for HPCC Systems instead of dedicated hardware, especially if the workload does not execute 24/7.
AWS has developed several features and tools for launching clusters. CloudFormation provides users a tool to make creating and managing an AWS resources much easier. Foremost it consists of a template (CFT) that defines resources required. The template is parameterizable and flexible so that a single CFT can launch an HPCC Systems cluster with an arbitrary number of nodes, various amount of memory per node, and other configuration options. Second, an Amazon Machine Image (AMI) contains the information needed to launch a compute node, with appropriate software, and configure it for a specific operation. We developed a CFT and an AMI for HPCC Systems.
Additionally, we developed a reference architecture for HPCC Systems in AWS. It is a typical N+1 cluster, N worker nodes and one node (or mode) for cluster wide services such as Dali. The architecture also has storage (i.e., EBS volumes) and networking (i.e., VPN) resources. Significant effort was expended to determine the best set of resources for HPCC Systems clusters.
Furthermore, we created a program to create and manage HPCC Systems clusters in AWS from the command line. This talk will present the tools we created. It also explains and justifies the reference architecture and many of the configuration options.
Vince Freeh
Associate Professor, North Carolina State University
Vincent W. Freeh is an associate professor of computer science at North Carolina State University. He received his Ph.D. in 1996 from the University of Arizona. His research focus is high-performance system software, with emphasis on filesystems, parallel and distributed systems, power-aware computing, and storage systems. Prof Freeh teaches courses in the above research areas as well as in compilers. He has more than 55 referred publications in numerous computer science conferences and scientific journals. He received an NSF CAREER Award and several IBM Faculty Development Awards. He was a captain in the US Army Corps of Engineers before entering graduate school for his MS.
Chin-Jung Hsu
PhD Student, North Carolina State University
Chin-Jung Hsu is a Ph.D. candidate in Computer Science at North Carolina State University. His primary research interests include distributed systems, storage systems, and performance optimization. He interned at NetApp and AT&T Research Lab, where he applied machine learning techniques to distributed storage systems for ensuring performance guarantees. Chin-Jung is currently working on how to efficiently run HPCC Systems applications on the public clouds such as AWS and Azure.
The document provides an introduction to Azure DocumentDB, a fully managed NoSQL database service. It discusses key features like schema-free JSON documents, automatic indexing, and the ability to run JavaScript code directly in the database using stored procedures. It also covers how to configure an DocumentDB account, create databases and collections, perform CRUD operations on documents, and write simple stored procedures. The presentation aims to explain the basics of DocumentDB and demonstrates how to interact with it programmatically.
The document discusses options for analyzing semi-structured event data at Coursera. It considers Hive, Pig, and Scalding. Scalding uses Scala and allows joining different data sources and expressing multiple map-reduce jobs in a succinct way. However, it requires learning Scala. An example shows loading event, course, and topic data and joining them to analyze relationships between the data.
FITC presents: Mobile & offline data synchronization in Angular JSFITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
OVERVIEW
Are you building mobile or web applications with AngularJS and wish they would work when you were offline? You can read, send and delete mail from your mobile email client when you are offline, why not from your AngularJS app? AngularJS is completely agnostic when it comes to creating your data models. Let’s explore what is required to allow your application to be useful to your users even without an internet connection.
INTENDED AUDIENCE - BEGINNER - INTERMEDIATE
This presentation is for developers that know they are looking for offline and data synchronization capabilities. Or, possibly for managers that wish to have a greater understanding of what their options are in AngularJS to create such functionality.
Daniel Zen, CEO, Zen Digital
Daniel Zen is the CEO of Zen Digital, founder of the New York AngularJS Meetup, a frequent lecturer, and a former consultant for Google, Pivotal Labs and various Fortune 500 companies. Zen Digital uses Agile techniques to move projects forward while continuously integrating new code and ideas, producing elegant frontend experiences and efficient backend systems for web and mobile applications.
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
General architectural concepts of Elasticsearch and what's new in version 5? Examples are prepared with our company business therefore these are excluded from presentation.
Rich storytelling with Drupal, Paragraphs and Islandora DAMSalxbrdg
Drupal's Paragraphs module, combined with a DAMS (Digital Asset Management System) can deliver powerful, rich stories on the web.
This session will show how, showing the inner workings of the Baseball Hall of Fame (baseballhall.org) website as a case study. This site uses Drupal with the Islandora DAMS to leverage the Baseball Hall of Fame's huge archive of images.
Topics covered:
- Building flexible content types using the Paragraphs module
- Multifaceted display of content using view modes
- DAMS & integrating Islandora assets with Drupal content
First presented at DrupalCamp Brighton in January 2015, by Alex Bridge and Tassos Koutlas.
The document discusses Code First, an approach in Entity Framework for modeling databases. It covers creating classes to represent database tables, adding attributes for additional database control, creating a DbContext class, initializing the database using an initializer, and provides demos of these concepts.
The document discusses the Drupal 7 Entity Cache module. It summarizes that the module caches entity data, including fields, after the first load to improve performance. It caches the full entity to serve from cache until expiration rather than reloading the entity and fields on each request. The module already supports core Drupal entities and makes it easy to cache other entity types as well. Installing and enabling the module provides these caching benefits without additional configuration.
Kubernetes at Spreadshirt - First steps to productionJens Hadlich
This presentation describes how we at Spreadshirt got started on our adventure into Docker-land which finally led to introducing Kubernetes for container orchestration.
This document discusses deploying SharePoint 2013 on Microsoft Azure infrastructure as a service (IaaS). It covers key Azure concepts like virtual networks, availability, disks, and virtual machines. Virtual networks allow grouping of virtual machines and enabling Active Directory. High availability is achieved through location, regions, affinity groups, and availability sets. Disk storage and performance considerations for databases and content are provided. Sample virtual machine configurations show optimal disk layout and sizing for SharePoint and SQL Server.
This document provides summaries of NoSQL databases MongoDB, ElasticSearch, and Couchbase. It discusses their key features and uses cases. MongoDB is a document-oriented database that stores data in JSON-like documents. ElasticSearch is a search engine and stores data in JSON documents for real-time search and analytics capabilities. Couchbase is a key-value store that provides high-performance access to data through caching and supports high concurrency.
MongoDB is a document database that stores data in BSON format, which is similar to JSON. It is a non-relational, schema-free database that scales easily and supports massive amounts of data and high availability. MongoDB can replace traditional relational databases for certain applications, as it offers dynamic schemas, horizontal scaling, and high performance. Key features include indexing, replication, MapReduce and rich querying of embedded documents.
Crate Data is a distributed SQL database that allows querying large datasets using SQL. It uses Elasticsearch for storage but adds features like SQL support, accurate aggregations, and import/export capabilities. Crate Data can start a cluster in 1 minute, is open source, and provides a simple, reliable, and scalable way to store and query distributed data using SQL.
The importance of search for modern applications is evident and nowadays it is higher than ever. A lot of projects use search forms as a primary interface for communication with a user. Though implementation of an intelligent search functionality is still a challenge and we need a good set of tools.
In this presentation, I will talk through the high-level architecture and benefits of Elasticsearch with some examples. Aside from that, we will also take a look at its existing competitors, their similarities, and differences.
The document discusses the OpenNTF Domino API (ODA), an open source project that provides additional capabilities for working with Java and Domino. It was started in 2013 and fills gaps for Java developers working with Domino. The ODA makes common tasks like session handling, view handling, document handling and transactions easier. It also introduces new capabilities like improved date/time functions and Xots for executing multi-threaded tasks. The document provides an overview of the ODA and examples of how it can simplify and enhance Java code that interacts with Domino.
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин ВладевPlovDev Conference
This document discusses Kubernetes and container orchestration. It provides an overview of Kubernetes, including its key features like horizontal scaling, automated rollouts and rollbacks, storage orchestration, self-healing capabilities, service discovery and load balancing. The document also discusses Kubernetes concepts like pods, labels, selectors, controllers and services. It outlines Kubernetes' architecture and control loops that drive the current state towards the desired state.
Rapid prototyping using azure functions - A walk on the wild sideSamrat Saha
My presentation at the 2016 Miwaukee .NET conference.
Talked about how I leveraged Azure Functions to rapidly prototype a product.
http://www.mkedotnet.com/sessions/azure-functions/
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
Selecting the right persistent storage options for apps in containers Open So...bipin kunal
No matter where an application is running, it will most likely need some form of storage. When running application in container environment, persistent storage is needed. There are plenty of storage plugins available which can provide persistent storage for application containers. With plenty of persistent storage available, it becomes evident to understand the different persistent storage options, their access modes and how it works so that applications can make better use of persistent storage. Join us and be able to choose right persistent for your applications. We will take you through : what all various persistent storage options and access method we have, how access mode suites your workload.
Cassandra is a highly scalable, open-source distributed database designed to handle large amounts of structured data across many servers. It provides high availability with no single point of failure and was created by Facebook to power search on their messaging platform. Cassandra uses a decentralized peer-to-peer architecture and replicates data across multiple data centers for fault tolerance. It emphasizes performance and scalability over more complex query options and does not support features like joins typically found in relational databases. Companies like Netflix and Hulu use Cassandra for its availability, scalability, and ability to span large clusters with minimal maintenance.
AWS Fargate in practice. How to run containers without managing EC2 instancesMax Borysov
Fargate allows running containers on AWS without managing servers. Key concepts include repositories for images, clusters for grouping resources, task definitions for configuring containers, and scheduled tasks for automating them. Backup tasks can restore databases, generate files, store in S3/Glacier, and delete resources to save on costs compared to reserved RDS storage. Monitoring includes CloudWatch and custom scripts.
OpenStack Cinder, Implementation Today and New Trends for TomorrowEd Balduf
This document discusses OpenStack Block Storage (Cinder) implementations, trends, and the future direction of Cinder. It provides an overview of Cinder's mission to provide on-demand, self-service block storage and its plugin architecture that supports various backend storage devices. It also discusses some common storage types in OpenStack and looks at specific Cinder features, configurations, and the user experience. The document concludes by exploring how Cinder may evolve to better support enterprise applications and looks at upcoming changes in the Liberty release.
This document discusses Linux server provisioning using Stacki. Stacki is a tool that automates the provisioning of Linux servers at scale from bare metal to a fully configured system. It addresses the exponential complexity of managing large clusters as more servers are added. Stacki handles all aspects of server provisioning from OS installation to configuration of networking, storage, software and more. It provides a fully automated, repeatable process to quickly deploy and manage servers.
The document summarizes an event called UKLUG 2012 that was held from September 2-4, 2012 at Cardiff University in Wales. It focused on XPages topics beyond the basics. The agenda included sessions on JavaScript/CSS aggregation, enabling pre-load for XPages, Java design elements, JAR design elements, Faces-config design elements, themes, and the XPages Extension Library.
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
General architectural concepts of Elasticsearch and what's new in version 5? Examples are prepared with our company business therefore these are excluded from presentation.
Rich storytelling with Drupal, Paragraphs and Islandora DAMSalxbrdg
Drupal's Paragraphs module, combined with a DAMS (Digital Asset Management System) can deliver powerful, rich stories on the web.
This session will show how, showing the inner workings of the Baseball Hall of Fame (baseballhall.org) website as a case study. This site uses Drupal with the Islandora DAMS to leverage the Baseball Hall of Fame's huge archive of images.
Topics covered:
- Building flexible content types using the Paragraphs module
- Multifaceted display of content using view modes
- DAMS & integrating Islandora assets with Drupal content
First presented at DrupalCamp Brighton in January 2015, by Alex Bridge and Tassos Koutlas.
The document discusses Code First, an approach in Entity Framework for modeling databases. It covers creating classes to represent database tables, adding attributes for additional database control, creating a DbContext class, initializing the database using an initializer, and provides demos of these concepts.
The document discusses the Drupal 7 Entity Cache module. It summarizes that the module caches entity data, including fields, after the first load to improve performance. It caches the full entity to serve from cache until expiration rather than reloading the entity and fields on each request. The module already supports core Drupal entities and makes it easy to cache other entity types as well. Installing and enabling the module provides these caching benefits without additional configuration.
Kubernetes at Spreadshirt - First steps to productionJens Hadlich
This presentation describes how we at Spreadshirt got started on our adventure into Docker-land which finally led to introducing Kubernetes for container orchestration.
This document discusses deploying SharePoint 2013 on Microsoft Azure infrastructure as a service (IaaS). It covers key Azure concepts like virtual networks, availability, disks, and virtual machines. Virtual networks allow grouping of virtual machines and enabling Active Directory. High availability is achieved through location, regions, affinity groups, and availability sets. Disk storage and performance considerations for databases and content are provided. Sample virtual machine configurations show optimal disk layout and sizing for SharePoint and SQL Server.
This document provides summaries of NoSQL databases MongoDB, ElasticSearch, and Couchbase. It discusses their key features and uses cases. MongoDB is a document-oriented database that stores data in JSON-like documents. ElasticSearch is a search engine and stores data in JSON documents for real-time search and analytics capabilities. Couchbase is a key-value store that provides high-performance access to data through caching and supports high concurrency.
MongoDB is a document database that stores data in BSON format, which is similar to JSON. It is a non-relational, schema-free database that scales easily and supports massive amounts of data and high availability. MongoDB can replace traditional relational databases for certain applications, as it offers dynamic schemas, horizontal scaling, and high performance. Key features include indexing, replication, MapReduce and rich querying of embedded documents.
Crate Data is a distributed SQL database that allows querying large datasets using SQL. It uses Elasticsearch for storage but adds features like SQL support, accurate aggregations, and import/export capabilities. Crate Data can start a cluster in 1 minute, is open source, and provides a simple, reliable, and scalable way to store and query distributed data using SQL.
The importance of search for modern applications is evident and nowadays it is higher than ever. A lot of projects use search forms as a primary interface for communication with a user. Though implementation of an intelligent search functionality is still a challenge and we need a good set of tools.
In this presentation, I will talk through the high-level architecture and benefits of Elasticsearch with some examples. Aside from that, we will also take a look at its existing competitors, their similarities, and differences.
The document discusses the OpenNTF Domino API (ODA), an open source project that provides additional capabilities for working with Java and Domino. It was started in 2013 and fills gaps for Java developers working with Domino. The ODA makes common tasks like session handling, view handling, document handling and transactions easier. It also introduces new capabilities like improved date/time functions and Xots for executing multi-threaded tasks. The document provides an overview of the ODA and examples of how it can simplify and enhance Java code that interacts with Domino.
PlovDev 2016: Оркестрация на контейнери с Kubernetes - Мартин ВладевPlovDev Conference
This document discusses Kubernetes and container orchestration. It provides an overview of Kubernetes, including its key features like horizontal scaling, automated rollouts and rollbacks, storage orchestration, self-healing capabilities, service discovery and load balancing. The document also discusses Kubernetes concepts like pods, labels, selectors, controllers and services. It outlines Kubernetes' architecture and control loops that drive the current state towards the desired state.
Rapid prototyping using azure functions - A walk on the wild sideSamrat Saha
My presentation at the 2016 Miwaukee .NET conference.
Talked about how I leveraged Azure Functions to rapidly prototype a product.
http://www.mkedotnet.com/sessions/azure-functions/
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
Selecting the right persistent storage options for apps in containers Open So...bipin kunal
No matter where an application is running, it will most likely need some form of storage. When running application in container environment, persistent storage is needed. There are plenty of storage plugins available which can provide persistent storage for application containers. With plenty of persistent storage available, it becomes evident to understand the different persistent storage options, their access modes and how it works so that applications can make better use of persistent storage. Join us and be able to choose right persistent for your applications. We will take you through : what all various persistent storage options and access method we have, how access mode suites your workload.
Cassandra is a highly scalable, open-source distributed database designed to handle large amounts of structured data across many servers. It provides high availability with no single point of failure and was created by Facebook to power search on their messaging platform. Cassandra uses a decentralized peer-to-peer architecture and replicates data across multiple data centers for fault tolerance. It emphasizes performance and scalability over more complex query options and does not support features like joins typically found in relational databases. Companies like Netflix and Hulu use Cassandra for its availability, scalability, and ability to span large clusters with minimal maintenance.
AWS Fargate in practice. How to run containers without managing EC2 instancesMax Borysov
Fargate allows running containers on AWS without managing servers. Key concepts include repositories for images, clusters for grouping resources, task definitions for configuring containers, and scheduled tasks for automating them. Backup tasks can restore databases, generate files, store in S3/Glacier, and delete resources to save on costs compared to reserved RDS storage. Monitoring includes CloudWatch and custom scripts.
OpenStack Cinder, Implementation Today and New Trends for TomorrowEd Balduf
This document discusses OpenStack Block Storage (Cinder) implementations, trends, and the future direction of Cinder. It provides an overview of Cinder's mission to provide on-demand, self-service block storage and its plugin architecture that supports various backend storage devices. It also discusses some common storage types in OpenStack and looks at specific Cinder features, configurations, and the user experience. The document concludes by exploring how Cinder may evolve to better support enterprise applications and looks at upcoming changes in the Liberty release.
This document discusses Linux server provisioning using Stacki. Stacki is a tool that automates the provisioning of Linux servers at scale from bare metal to a fully configured system. It addresses the exponential complexity of managing large clusters as more servers are added. Stacki handles all aspects of server provisioning from OS installation to configuration of networking, storage, software and more. It provides a fully automated, repeatable process to quickly deploy and manage servers.
The document summarizes an event called UKLUG 2012 that was held from September 2-4, 2012 at Cardiff University in Wales. It focused on XPages topics beyond the basics. The agenda included sessions on JavaScript/CSS aggregation, enabling pre-load for XPages, Java design elements, JAR design elements, Faces-config design elements, themes, and the XPages Extension Library.
This is a summary of the sessions I attended at PASS Summit 2017. Out of the week-long conference, I put together these slides to summarize the conference and present at my company. The slides are about my favorite sessions that I found had the most value. The slides included screenshotted demos I personally developed and tested alike the speakers at the conference.
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
Delivery of a new Bio-informatics infrastructure at the Wellcome Trust Sanger Center. We include how to programatically create, manage and provide providence for images used both at Sanger and elsewhere using open source tools and continuous integration.
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
I presented to the Georgia Southern Computer Science ACM group. Rather than one topic for 90 minutes, I decided to do an UnConference. I presented them a list of 8-9 topics, let them vote on what to talk about, then repeated.
Each presentation was ~8 minutes, (Except Career) and was by no means an attempt to explain the full concept or technology. Only to wake up their interest.
Introduction to Stacki - World's fastest Linux server provisioning ToolSuresh Paulraj
Stacki is an open source tool for provisioning and managing Linux servers at scale. It provides fast, reliable provisioning of servers from bare metal to a fully configured system. PayPal uses Stacki to manage their Hadoop infrastructure, which includes over 3,000 nodes spread across multiple datacenters. Stacki automates tasks like disk formatting, partitioning, OS installation, and integration with other tools to quickly provision new servers. It helped PayPal reduce provisioning time from hours to just 14 minutes for 288 servers.
This document provides an agenda for a conference on XPages Beyond the Basics held from February 2-3, 2012 in Denmark. The agenda includes topics like JavaScript/CSS aggregation, pre-loading for XPages, Java design elements, themes, the XPages Extension Library, relational database support using JDBC, exporting data to Excel/PDF, and more. The document also introduces the speaker, Ulrich Krause, an IBM Champion and experienced Notes/Domino developer.
This talk (delivered at QConLondon 2016) covers the evolution of Coursera's nearline architecture, delves into our latest generation system, and then covers the flagship application of the architecture (evaluating programming assignments).
The DrupalCampLA 2011 presentation on backend performance. The slides go over optimizations that can be done through the LAMP (or now VAN LAMMP stack for even more performance) to get everything up and running.
12 core technologies you should learn, love, and hate to be a 'real' technocratlinoj
Presentation at PodCamp New Hampshire 2009
A "dim sum" (light sampling) of core technologies which everyone who considers themselves a "technocrat" should have some understanding and appreciation. Since there's a lot to cover, each topic will move pretty quickly, keeping the descriptions at a conceptual level.
Scala is widely used at Treasure Data for data analytics workflows, management of the Presto query engine, and open-source libraries. Some key uses of Scala include analyzing query logs to optimize Presto performance, developing Prestobase using Scala macros and libraries like Airframe, and integrating Spark with Treasure Data. Treasure Data engineers have also created several open-source Scala libraries, such as wvlet-log for logging and Airframe for dependency injection, and sbt plugins to facilitate packaging, testing, and deployment.
A presentation at Twitter's official developer conference, Chirp, about why we use the Scala programming language and how we build services in it. Provides a tour of a number of libraries and tools, both developed at Twitter and otherwise.
The document discusses the benefits of using Google Web Toolkit (GWT) for building AJAX applications. It summarizes key features of GWT like cross-compiling Java to JavaScript, deferred binding, compiler optimizations, and UI improvements in GWT 2.x like CSS resources, image inlining, and UiBinder. It encourages adopting GWT for its speed advantages like file caching and reduced payload sizes.
This document provides an overview of the CGSpace technical stack, which is based on DSpace. It discusses the key components including Nginx as the web server, Tomcat as the application server, PostgreSQL for metadata storage, and the Ubuntu operating system. It describes the Git-based development workflow using branches on GitHub for testing and deploying changes to the production and development instances.
The document discusses various hosting solutions for Drupal including web hosting, virtual private servers, dedicated servers, and Amazon EC2. It provides details on the costs, reliability, customization options, and maintenance requirements for each solution. Additionally, it covers some key terms and tools related to using Amazon EC2, such as instances, AMIs, EBS, S3 storage, the command line interface, and the ElasticFox browser plugin.
This document provides an agenda for the BLUG 2012 conference on XPages Beyond the Basics taking place March 22-23, 2012 in Antwerp. The agenda covers topics like JavaScript/CSS aggregation, pre-loading for XPages, Java design elements, themes, the XPages Extension Library, relational database support, and recommended resources. It also includes background information on the presenter Ulrich Krause and his experience with Lotus Notes, Domino, and XPages development.
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.
Alluxio Global Online Meetup
August 25, 2020
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Abner Ferreira, Simbiose Ventures
Caio Pavanelli, Simbiose Ventures
Bin Fan, Alluxio
Over the last few years, organizations have worked towards the separation of storage and compute for a number of benefits in the areas of cost, data duplication and data latency. Cloud resolves most of these issues but comes to the expense of needing a way to query data on remote storages. Alluxio and Presto are a powerful combination to address the compute problem, which is part of the strategy used by Simbiose Ventures to create a product called StorageQuery - A platform to query files in cloud storages with SQL.
This talk will focus on:
- How Alluxio fits StorageQuery's tech stack;
- Advantages of using Alluxio as a cache layer and its unified filesystem;
- Development of new under file system for Backblaze B2 and fine-grained code documentation;
- ShannonDB remote storage mode.
Similar to Running DSpace: Technical overview, lessons learned, workflows and essential skills (20)
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
Presentation by Guy Ilboudo, Abel Sènabgè Biguezoton, Cheick Abou Kounta Sidibé, Modou Moustapha Lo, Zoë Campbell and Michel Dione at the 6th Peste des Petits Ruminants Global Research and Expertise Networks (PPR-GREN) annual meeting, Bengaluru, India, 28–30 November 2023.
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
Poster by Guy Ilboudo, Abel Sènabgè Biguezoton, Cheick Abou Kounta Sidibé, Modou Moustapha Lo, Zoë Campbell and Michel Dione presented at the 6th Peste des Petits Ruminants Global Research and Expertise Networks (PPR-GREN) annual meeting, Bengaluru, India, 29 November 2023.
A training, certification and marketing scheme for informal dairy vendors in ...ILRI
Presentation by Silvia Alonso, Jef L. Leroy, Emmanuel Muunda, Moira Donahue Angel, Emily Kilonzi, Giordano Palloni, Gideon Kiarie, Paula Dominguez-Salas and Delia Grace at the Micronutrient Forum 6th Global Conference, The Hague, Netherlands, 16 October 2023.
Milk safety and child nutrition impacts of the MoreMilk training, certificati...ILRI
Poster by Silvia Alonso, Emmanuel Muunda, Moira Donahue Angel, Emily Kilonzi, Giordano Palloni, Gideon Kiarie, Paula Dominguez-Salas, Delia Grace and Jef L. Leroy presented at the Micronutrient Forum 6th Global Conference, The Hague, Netherlands, 16 October 2023.
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseasesILRI
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
Preventing preventable diseases: a 12-slide primer on foodborne diseaseILRI
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistanceILRI
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help enhance one's emotional well-being and mental clarity.
Food safety research in low- and middle-income countriesILRI
Presentation by Hung Nguyen-Viet at the first technical meeting to launch the Food Safety Working Group under the One Health Partnership framework, Hanoi, Vietnam, 28 September 2023
The Food Safety Working Group (FSWG) in Vietnam was created in 2015 at the request of the Deputy Prime Minister to address food safety issues in the country. It brings together government agencies, ministries, and development partners to facilitate joint policy dialogue and improve food safety. Over eight years of operations led by different organizations, the FSWG has contributed to various initiatives. However, it faces challenges of diminished government participation over time and dependence on active members. Going forward, it will strengthen its operations by integrating under Vietnam's One Health Partnership framework to better engage stakeholders and achieve policy impacts.
Reservoirs of pathogenic Leptospira species in UgandaILRI
Presentation by Lordrick Alinaitwe, Martin Wainaina, Salome Dürr, Clovice Kankya, Velma Kivali, James Bugeza, Martin Richter, Kristina Roesel, Annie Cook and Anne Mayer-Scholl at the University of Bern Graduate School for Cellular and Biomedical Sciences Symposium, Bern, Switzerland, 29 June 2023.
Assessing meat microbiological safety and associated handling practices in bu...ILRI
Presentation by Patricia Koech, Winnie Ogutu, Linnet Ochieng, Delia Grace, George Gitao, Lily Bebora, Max Korir, Florence Mutua and Arshnee Moodley at the 8th All Africa Conference on Animal Agriculture, Gaborone, Botswana, 26–29 September 2023.
Ecological factors associated with abundance and distribution of mosquito vec...ILRI
Poster by Max Korir, Joel Lutomiah and Bernard Bett presented the 8th All Africa Conference on Animal Agriculture, Gaborone, Botswana, 26–29 September 2023.
Practices and drivers of antibiotic use in Kenyan smallholder dairy farmsILRI
Poster by Lydiah Kisoo, Dishon M. Muloi, Walter Oguta, Daisy Ronoh, Lynn Kirwa, James Akoko, Eric Fèvre, Arshnee Moodley and Lillian Wambua presented at Tropentag 2023, Berlin, Germany, 20–22 September 2023.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
3. Instance Overview
CGSpace
(cgspace.cgiar.org)
● “Production”
● Should always be up &
stable
● Is the “reference”
implementation
DSpace Test
(dspacetest.cgiar.org)
● “Development”
● Changes to style,
functionality, DSpace etc
are tested here first
● Sometimes wiped clean
5. Living With Legacy Decisions...
CGSpace and DSpace Test on the same machine…
● In 2010 CGSpace had a fraction of the content,
users, etc, so it didn’t affect the running of the
system
● Not true anymore!
● 100s of 1000s of monthly views...
● Large assetstore, log files, RAM / CPU usage, etc
7. CGSpace Code Is 100% Open
Source code is on github: github.com/ilri/DSpace
8. How The Code Is Organized
Production code lives in the 3_x-prod branch; this
is stable, tested code. Updates (if any) come from
the development branch on Monday.
Development code lives in the 3_x-dev branch;
this is semi-tested code! Changes throughout the
week.
9. “Social Coding” on GitHub
● Anyone can “fork” the
code repository to their
own GitHub account
● Source code repositories
can share code via “pull
requests”
● Developers can comment
on changes and discuss
issues
GitHub “OctoCat”
12. Workflow Lessons Learned
● Sending changes is good, but leaves the
burden of merging to me
● Sending patches is better, but requires sender
to know how to generate them
● Sending a pull request is best, but requires
sender knows how to use git, branches, etc
14. Scenario: Create A New Theme
Creating an XMLUI theme for a new community
Create community in DSpace (ie, 10568/38440)
Add custom metadata (ie, cg.subject.bioversity)
Add custom submission template (input-forms.xml)
Copy existing XMLUI theme (ie ILRI) as a reference, and
customize for center-specific metadata, look & feel, etc
5. Update search & browse indexes (dspace.cfg)
6. Update XMLUI config for new theme (xmlui.xconf)
1.
2.
3.
4.
15. DSpace Sysadmin Crashcourse
DSpace...
● is a Java application
● builds using maven and ant
● uses PostgreSQL as a database backend
● stores PDFs and other blobs in the filesystem
(“assetstore”)
● runs best on Linux
17. Why Not Use Tomcat Directly?
Any sysadmin will tell you that working with Tomcat is a joy*.
Surprisingly**, these things are annoying in Tomcat:
● Virtual hosting
● SSL
● redirects
● caching and manipulating headers
*for some definitions of “joy”
**not surprising, actually
18. Essential Technical Skills
Managing a DSpace instance doesn’t require “programmers”
or “developers” (but it doesn’t hurt).
Mainly, you’ll need:
● Linux experience (Debian, CentOS, Ubuntu)
● Administration experience (web servers, log files, cron jobs,
security)
● Software development concepts (git, patches,
branching/merging)
19. Better lives through livestock
ilri.org
The presentation has a Creative Commons licence. You are free to re-use or distribute this work, provided credit is given to ILRI.