The document discusses a two-day training on the digital repository system DSpace that was organized by BALID Institution of Information Management in Bangladesh. It provides an overview of DSpace, including what it is, its architecture and technology, software requirements, and comparisons to other repository systems. It also outlines the organizational hierarchy of communities, sub-communities, collections, and items in DSpace.
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job.
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
This talk will focus on Journey of technical challenges, trade offs and ground-breaking achievements for building performant and scalable pipelines from the experience working with our customers.
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsDatabricks
This talk will cover some practical aspects of Apache Spark monitoring, focusing on measuring Apache Spark running on cloud environments, and aiming to empower Apache Spark users with data-driven performance troubleshooting. Apache Spark metrics allow extracting important information on Apache Spark’s internal execution. In addition, Apache Spark 3 has introduced an improved plugin interface extending the metrics collection to third-party APIs. This is particularly useful when running Apache Spark on cloud environments as it allows measuring OS and container metrics like CPU usage, I/O, memory usage, network throughput, and also measuring metrics related to cloud filesystems access. Participants will learn how to make use of this type of instrumentation to build and run an Apache Spark performance dashboard, which complements the existing Spark WebUI for advanced monitoring and performance troubleshooting.
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job.
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
This talk will focus on Journey of technical challenges, trade offs and ground-breaking achievements for building performant and scalable pipelines from the experience working with our customers.
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsDatabricks
This talk will cover some practical aspects of Apache Spark monitoring, focusing on measuring Apache Spark running on cloud environments, and aiming to empower Apache Spark users with data-driven performance troubleshooting. Apache Spark metrics allow extracting important information on Apache Spark’s internal execution. In addition, Apache Spark 3 has introduced an improved plugin interface extending the metrics collection to third-party APIs. This is particularly useful when running Apache Spark on cloud environments as it allows measuring OS and container metrics like CPU usage, I/O, memory usage, network throughput, and also measuring metrics related to cloud filesystems access. Participants will learn how to make use of this type of instrumentation to build and run an Apache Spark performance dashboard, which complements the existing Spark WebUI for advanced monitoring and performance troubleshooting.
Data Lakehouse Symposium | Day 1 | Part 1Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Dewey Decimal Classification vs Library of Congress Classification Francheska Vonne Gali
A graphical design on DDC vs LCC.
Library of Congress System and Dewey Decimal System are two popular classification systems in libraries.
Course: LIBSCI 22 - Organization of Information Resources II
Teacher: Sarah Angiela Ragay
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
Kerberos is the ubiquitous authentication mechanism when it comes to secure any Hadoop Services. With recent updates in Hadoop core and various Apache Hadoop components, inherent Kerberos support has matured and has come a long way.
Understanding & configuring Kerberos is still a challenge but even more painful & frustrating is troubleshooting a Kerberos issue. There are lot of things (small & big) that can go wrong (and will go wrong!). This talk covers the Kerberos debugging part in detail and discusses the tools & tricks that can be used to narrow down any Kerberos issue.
Rather than discussing the issues and their resolution, we will focus on how to approach a Kerberos problem and do's / dont's in Kerberos scene. This talk will provide a step by step guide that will equip the audience for troubleshooting future Kerberos problems.
Agenda is to discuss:
- Systematic approach to Kerberos troubleshooting
- Kerberos Tools available in Hadoop arsenal
- Tips & Tricks to narrow down Kerberos issues quickly
- Some nasty Kerberos issues from Support trenches
Some prior knowledge on Kerberos basics will be appreciated but is not a prerequisite.
Speaker:
Vipin Rathor, Sr. Product Specialist (HDP Security), Hortonworks
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source. As well Microsoft have recently released Azure Time Series Insights - cloud offering of a TS DB with the usability promises from the Microsoft brand.
This session is about managing and understanding IoT data.
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
Speaker: Sudarshan Kadambi and Matthew Hunt (Bloomberg LP)
Bloomberg is a financial data and analytics provider, so data management is core to what we do. There's tremendous diversity in the type of data we manage, and HBase is a natural fit for many of these datasets - from the perspective of the data model as well as in terms of a scalable, distributed database. This talk covers data and analytics use cases at Bloomberg and operational challenges around HA. We'll explore the work currently being done under HBASE-10070, further extensions to it, and how this solution is qualitatively different to how failover is handled by Apache Cassandra.
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Agenda:
Architectural Overview
Presentation to the Client
Presentation to the Server/DB
High Availability and Disaster Recovery
Extended Architecture
Setup / Installation
Tests
Use cases
Perspective 12c
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeAngel Borroy López
Presentación realizada en Openexpo Europe 2023:
https://openexpoeurope.com/es/session/cuando-hyland-encontro-a-alfresco-evolucion-de-servicios-de-codigo-abierto-en-un-mundo-cloud-native/
Presenta una visión evolutiva de las plataformas de gestión documental: ECM, CSP y Cloud Native.
Incluye información relevante de los productos Alfresco, Nuxeo y Hyland Experience.
Data Lakehouse Symposium | Day 1 | Part 1Databricks
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Dewey Decimal Classification vs Library of Congress Classification Francheska Vonne Gali
A graphical design on DDC vs LCC.
Library of Congress System and Dewey Decimal System are two popular classification systems in libraries.
Course: LIBSCI 22 - Organization of Information Resources II
Teacher: Sarah Angiela Ragay
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
Kerberos is the ubiquitous authentication mechanism when it comes to secure any Hadoop Services. With recent updates in Hadoop core and various Apache Hadoop components, inherent Kerberos support has matured and has come a long way.
Understanding & configuring Kerberos is still a challenge but even more painful & frustrating is troubleshooting a Kerberos issue. There are lot of things (small & big) that can go wrong (and will go wrong!). This talk covers the Kerberos debugging part in detail and discusses the tools & tricks that can be used to narrow down any Kerberos issue.
Rather than discussing the issues and their resolution, we will focus on how to approach a Kerberos problem and do's / dont's in Kerberos scene. This talk will provide a step by step guide that will equip the audience for troubleshooting future Kerberos problems.
Agenda is to discuss:
- Systematic approach to Kerberos troubleshooting
- Kerberos Tools available in Hadoop arsenal
- Tips & Tricks to narrow down Kerberos issues quickly
- Some nasty Kerberos issues from Support trenches
Some prior knowledge on Kerberos basics will be appreciated but is not a prerequisite.
Speaker:
Vipin Rathor, Sr. Product Specialist (HDP Security), Hortonworks
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source. As well Microsoft have recently released Azure Time Series Insights - cloud offering of a TS DB with the usability promises from the Microsoft brand.
This session is about managing and understanding IoT data.
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
Speaker: Sudarshan Kadambi and Matthew Hunt (Bloomberg LP)
Bloomberg is a financial data and analytics provider, so data management is core to what we do. There's tremendous diversity in the type of data we manage, and HBase is a natural fit for many of these datasets - from the perspective of the data model as well as in terms of a scalable, distributed database. This talk covers data and analytics use cases at Bloomberg and operational challenges around HA. We'll explore the work currently being done under HBASE-10070, further extensions to it, and how this solution is qualitatively different to how failover is handled by Apache Cassandra.
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Agenda:
Architectural Overview
Presentation to the Client
Presentation to the Server/DB
High Availability and Disaster Recovery
Extended Architecture
Setup / Installation
Tests
Use cases
Perspective 12c
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
CSP: Evolución de servicios de código abierto en un mundo Cloud NativeAngel Borroy López
Presentación realizada en Openexpo Europe 2023:
https://openexpoeurope.com/es/session/cuando-hyland-encontro-a-alfresco-evolucion-de-servicios-de-codigo-abierto-en-un-mundo-cloud-native/
Presenta una visión evolutiva de las plataformas de gestión documental: ECM, CSP y Cloud Native.
Incluye información relevante de los productos Alfresco, Nuxeo y Hyland Experience.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
The Web of Data - Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C
Integrating an electronic lab notebook with a university it environment rdmf ...rmacneil88
Case study presented at the RDMF Conference in Leicester, November 2014, describing the integration of the RSpace ELN with the research infrastructure at the University of Edinburgh, including Edinburgh DataShare, Edinburgh DataStore and Edinburgh DataVault
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
Presented by Susanna Mornati at the 2019 DSpace North American User Group Meeting September 23 & 24, 2019 at the University of Minnesota in Minneapolis.
Abstract: DSpace-CRIS is a free open-source platform based on DSpace for Research Data and Information Management, adopted by a wide international community of universities and research centers: DSpace-CRIS Home. It complies with recommendations, open standards and technologies such as the OAI-PMH, SignPosting, and ResourceSync (recommended by the COAR Next Generation Repositories WG), it features complete ORCID integration, compliance with the CERIF model, the IIIF framework, and with the OpenAIRE Guidelines for Literature Repositories, Data Archives, CRIS Managers, to improve findability, accessibility, interoperability, and reuse of digital assets for research and cultural heritage. DSpace-CRIS collects and disseminates information about researchers' profiles, organizations, publications, patents, grants, awards, and all entities that populate the research domain and their relationships, besides storing and exposing full-text publications, datasets, and other relevant digital objects, providing persistent identifiers and long-term preservation capabilities. DSpace-RDM exposes datasets to visual exploration and M2M streaming for analysis thanks to the integration with CKAN. DSpace-GLAM enhances the fruition of the cultural heritage through the (crowd-funded) IIIF image viewer, providing remote fruition of cultural heritage and offering a great user experience. These flavors of DSpace allow to expose and share open data, open information, and open digital objects in a collaborative, interoperable, and sustainable way. The use cases of a variety of institutions in different countries and continents will be shared to show the use of this powerful technology.
Presented By: Nur Ahammad,
Senior Assistant Librarian & Adjunct Faculty
Department of Information Science and Library Management
Daffodil International University
Dspace-1.8.2 Installation on Centos-6.3Nur Ahammad
Dspace-1.8.2 Installation on Centos-6.3
Nur Ahammad
Junior Assistant Librarin
Independent University, Bangladesh
I install DSpace on Centos for Dhaka University Library in October 2012. That time I prepare this Manual.
Training on DSpace Institutional Repository
Organized by
BALID Institute of Information Management (BIIM
DSpace Manual for BALID Trainee
Institutional Repository
1-2 May 2014
Venue: CIRDAP
• Installation of DSpace on Debian
• Configuration of DSpace
• Customization of Dspace
• Cron Jobs setup for production system
• MTA Setup for DSpace
• Some Important Commands of PostgreSQL
• DSpace Discovery Setup
Prepared By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
Two Day-Long Training on “DSpace” Institutional Repository
Organized by
BALID Institution of Information Management (BIIM)
1-2 May 2014
Venue: CIRDAP
Dspace Self Archiving
Prepared By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
Workshop on Discovery of Library ResourcesNur Ahammad
Workshop on Discovery of Library Resources
June 21, 2012 and June 28, 2012
Introduce with IUB Library Online Public Access Catalog (OPAC) and Digital Institutional Repository (DIR)
OPAC http://opac.iub.edu.bd
DIR http://dir.iub.edu.bd:8081
Prepare By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
Training on Koha Integrated Library System (ILS)
Organized by BALID
3-7 September 2013
Installation of Koha on Debian
Post Installation of Koha
OPAC Customization
Some Important Commands of Mysql
Prepared By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
Providing First World Library services By using Koha, DSpace, vufind and DrupalNur Ahammad
Library Automation and Digitization
at
Chittagong Veterinary and Animal Sciences University
6 December 2012
Providing First World Library services
By using
Koha, DSpace, vufind and Drupal
Presented By
Nur Ahammad
Consultant of the project- Modernization of Central Library of CVASU
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Two day-long training on "DSpace" Institutional Repository
1. Two Day-Long Training on “DSpace” Institutional Repository
Organized by
BALID Institution of Information Management (BIIM)
1-2 May 2014
Venue: CIRDAP
DSpace Overview
Why DSpace better than Others
Edited By
Nur Ahammad
Junior Assistant Librarian
Independent University, Bangladesh
2. Institutional Repository
Institutional repositories collect, preserve, and disseminate the
intellectual output of an institution in digital form.
Increasingly, institutional repositories include other items
unique to the university as well, such as digitized historic
documents and archival materials (Nykanen, Melissa 2011).
An IR is a set of services and technologies that provide the
means to collect, manage, provide access to, disseminate, and
preserve digital materials produced at an institution. While
most institutional repositories are based at colleges and
universities, they also exist in governmental agencies,
museums, corporations, and other organizations. Within
colleges and universities, most IRs are managed by the library
(Markey, Rieh, St. Jean, Kim, & Yakel, 2007)”.
3. What is Dspace?
• A groundbreaking digital repository system, DSpace captures, stores, indexes,
preserves and redistributes an organization's research material in digital formats.
Research institutions worldwide use DSpace for a variety of digital archiving needs -
from institutional repositories (IRs) to learning object repositories or electronic
records management, and more. DSpace is freely available as open source software
you can customize and extend. An active community of developers, researchers and
users worldwide contribute their expertise to the DSpace Community.
• The first public version of DSpace was released in November 2002, as a joint effort
between developers from MIT and HP Labs. Following the first user group meeting
in March 2004, a group of interested institutions formed the DSpace Federation,
which determined the governance of future software development by adopting the
Apache Foundation's community development model as well establishing the
DSpace Committer Group. In July 2007 as the DSpace user community grew larger,
HP and MIT jointly formed the DSpace Foundation, a not-for-profit organization that
provided leadership and support. In May 2009 collaboration on related projects and
growing synergies between the DSpace Foundation and the Fedora Commons
organization led to the joining of the two organizations to pursue their common
mission in a not-for-profit called DuraSpace. Currently the DSpace software and
user community receives leadership and guidance from DuraSpace.
4. Top Reasons To Use DSpace
• Largest community of users and developers worldwide
• Free open source software
• Completely customizable to fit your needs
• Used by educational, government, private and commercial
institutions
• Can be installed out of the box
• Can manage and preserve all types of digital content
6. Technology Uses in DSpace
• Java Web Application
• RDBMS: PostgreSQL/Oracle
• Web interfaces: JSPUI which uses JSP and the Java Servlet API and XMLUI
(aka Manakin) based on Apache Cocoon, using XML and XSLT
• OAI-PMH v2.0 and Capable METS exporting Pakages
• Common interoperability standards for IR : SWORD (protocol)/RSS/Open
Search
• Faceted Search
• Solr (Lucence)
• Unique URL e.g. handle/DOI
7. Software for DSpace
• Debain Linux Operating System (ver-6, Squeeze)
• sun-java6-jdk
• tomcat6
• maven2
• postgresql-8.4
• Apache2 for font-end web-server
• Open-Office or LibreOffice Writer for converting word file
to PDF. (This is not mandatory, because word file can be
directly uploaded to DSpace without converting into pdf)
• IrfanView (freeware/shareware) for resizing and
converting images in variety of formats.
• Trassaract OCR Open Source Software
• Screen Capture Elite (Firefox add-ons) for capturing live
web images (Specially for news clippings)
8. Hardware requirement for Dspace
Minimal DSpace Production system requirements
• 2 GB of Random Access Memory (RAM)
– 1GB for Tomcat
– 1GB for Database (PostgreSQL or Oracle).
• 20 GB of Storage (or roughly enough storage for all the files you wish to store in
DSpace)
This minimal system should be able to support DSpace sites of roughly 20,000 items or
less. Though the exact number of items will depend on the amount of activity
(searches, accesses, downloads, etc) within the DSpace site.
Mid-range DSpace Production system
• 4 GB of Random Access Memory (RAM)
• 200 GB of Storage (or roughly enough storage for all the files you wish to store in
DSpace)
This mid-range system may be necessary for DSpace sites which either have a larger
number of items (roughly 50,000 or more) or a larger amount of activity (searches,
accesses, downloads, etc) within the system.
9. Hardware requirement for Dspace
High End DSpace Production system requirements:
• Quad Core processor
• 8GB of Random Access Memory (RAM)
• 73 GB 15,000 rpm network disks in RAID accessible over a gigabit connection for
storing the database and indexes
• 7,400 rpm network disks in RAID accessible over a gigabit connection for storing the
data whose size can be easily expanded.
The high-end system should only be necessary for extremely large or extremely active
DSpace sites. The majority of DSpace sites should not require this high end system until
they experience a larger amount of growth or activity.
11. Comparison Between DSpace and Greenstone
Features Dspace Greenstone
RDBMS for Metadata Storage PostgreSQL/Oracle No RDBMS
Persistent Identifier CNRI handle and DOI Does not use
Mechanism for audit for
integrity
Dspace checksum Checker No such type of tool
Events and Format Logs All type of logs In main log file Some logs. No logs for
collections and digital
object
Migration of Metadata
formats or Digital object
formats
Cross-walking capabilities Not have a means to do
this
Access control/Internet
Address Filter
Item view restriction and it can
filter internet address
Does not support
Protocol Supports OAI-PMH, OAI-ORE, SWORD,
WebDAV, OpenSearch, OpenURL,
RSS, ATOM
OAI-PMH, METS & Z39.50
14. DSpace Uses Dublin Core Metadata
Fifteen Core Elements of DC
Creator Title Subject
Contributor Date Description
Publisher Type Format
Coverage Rights Relation
Source Language Identifier
15. DSpace Uses Dublin Core Metadata
Starting syntax
Ending Syntax
Starting & Ending Syntax of Elements/Fields
Elements Values