Innovative methods for data integration: Linked Data and NLPariadnenetwork
Linked Data (LD) + Natural Language Processing (NLP)
Two technologies that open up new possibilities for semantic integration of archaeological datasets and fieldwork reports.
Overview
•Illustrative early examples
- a flavour of progress and challenges to date
•NLP of grey literature (English – Dutch)
•Mapping between multilingual vocabularies
Linked Data and cultural heritage data: an overview of the approaches from Eu...The European Library
Europeana provides access to digital resources from a wide range of cultural heritage institutions all across Europe. In order to support Europeana, a wide network of organizations collaborates in data integration activities. The European Library plays the role of library-domain aggregator for Europeana, and its activities include also being a gateway to the collections and data of Europe’s national and research libraries, operating on the principle of open data for re-use.
The Europeana Network addresses its data integration challenges by leveraging on Linked Data and the Semantic Web. Its approach to data integration is based in a single data model, the Europeana Data Model, which embraces the Semantic Web principles to integrate the various data models and ontologies used in cultural heritage data.
The paradigm of Linked Data, brings many new challenges to libraries. The generic nature of data representation used in Linked Data, while allowing any community to manipulate the data, also opens many paths for implementation, with no clear optimal choice for libraries. The European Library leverages on its operational infrastructure to make library data available. It maintains The European Library Open Dataset, which is derived from the data aggregated from member libraries, and made available under the Creative Commons CC0 1.0 Universal license, in order to promote and facilitate its reuse by any community.
Extensive linking is performed in the preparation of The European Library Open Dataset. It relies on Information Extraction and Data Mining to establish links to external open datasets, covering the most prominent entities types present in library data: persons, corporate bodies, places, concepts, intellectual works and manifestations.
The European Library also applies a linked data approach for intellectual property rights clearance processes, for supporting mass digitization projects. This approach is applied in the within the European ARROW rights infrastructure .
Audiovisual collections, the spoken word and user needs of scholars in the Hu...roelandordelman.nl
Audiovisual collections, the spoken word and user needs of scholars in the Humanities: Observations based on related work in The Netherlands 2005-2012. Presentation at the BL, Opening up Speech Archives, 08-02-2013
Innovative methods for data integration: Linked Data and NLPariadnenetwork
Linked Data (LD) + Natural Language Processing (NLP)
Two technologies that open up new possibilities for semantic integration of archaeological datasets and fieldwork reports.
Overview
•Illustrative early examples
- a flavour of progress and challenges to date
•NLP of grey literature (English – Dutch)
•Mapping between multilingual vocabularies
Linked Data and cultural heritage data: an overview of the approaches from Eu...The European Library
Europeana provides access to digital resources from a wide range of cultural heritage institutions all across Europe. In order to support Europeana, a wide network of organizations collaborates in data integration activities. The European Library plays the role of library-domain aggregator for Europeana, and its activities include also being a gateway to the collections and data of Europe’s national and research libraries, operating on the principle of open data for re-use.
The Europeana Network addresses its data integration challenges by leveraging on Linked Data and the Semantic Web. Its approach to data integration is based in a single data model, the Europeana Data Model, which embraces the Semantic Web principles to integrate the various data models and ontologies used in cultural heritage data.
The paradigm of Linked Data, brings many new challenges to libraries. The generic nature of data representation used in Linked Data, while allowing any community to manipulate the data, also opens many paths for implementation, with no clear optimal choice for libraries. The European Library leverages on its operational infrastructure to make library data available. It maintains The European Library Open Dataset, which is derived from the data aggregated from member libraries, and made available under the Creative Commons CC0 1.0 Universal license, in order to promote and facilitate its reuse by any community.
Extensive linking is performed in the preparation of The European Library Open Dataset. It relies on Information Extraction and Data Mining to establish links to external open datasets, covering the most prominent entities types present in library data: persons, corporate bodies, places, concepts, intellectual works and manifestations.
The European Library also applies a linked data approach for intellectual property rights clearance processes, for supporting mass digitization projects. This approach is applied in the within the European ARROW rights infrastructure .
Audiovisual collections, the spoken word and user needs of scholars in the Hu...roelandordelman.nl
Audiovisual collections, the spoken word and user needs of scholars in the Humanities: Observations based on related work in The Netherlands 2005-2012. Presentation at the BL, Opening up Speech Archives, 08-02-2013
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...locloud
This presentation provides an introduction to thesaurus management in the LoCloud Vocabulary Services given during the LoCloud training workshops. It provides an introduction to controlled vocabularies, thesaurus for information retrieval and interoperability, to SKOS, multilingual vocabulary issues and to the federated model adopted for thesaurus management within the LoCloud service, which is based in TemaTres. The presentation includes a list of the vocabularies that have been integrated within the LoCloud service. There is also a walk-through of MediaThread and how this was used in the vocabulary management training offered in the workshop.
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...innovatics
Explora el ámbito de los servicios de descubrimiento basados en índices, orientado al ámbito de las bibliotecas académicas, incluyendo Primo de Ex Libris, Summon de ProQuest, Discovery Service de Ebsco y Discovery Service de OCLC WorldCat.
Se aborda la Iniciativa Open Discovery y la reciente tendencia hacia una mayor participación por parte de los proveedores de contenidos. Se discute acerca de las tecnologías más adecuadas para las bibliotecas que tienen mayor preocupación por la participación del usuario, sobre el acceso a los libros impresos y electrónicos, con menos restricciones para los artículos académicos que se encuentran en Descubrimiento. Se presenta el papel de las interfaces de descubrimiento de código abierto tales como VuFind y Blacklight. Se aborda el estado de la nueva generación de plataformas de servicios de la biblioteca. La presentación ofrecerá los aspectos más destacados de la industria de automatización de la biblioteca global, con especial atención a los protagonistas y tendencias en América Latina. Basado en el "Informe 2014 de los Sistemas de Bibliotecas" http://www.americanlibrariesmagazine.org/article/library-systems-report-2014
Abstract
Discovery, delivery, and management: the current wave of new library technologies and industry trends
Explore the realm of index-based discovery services oriented more to academic libraries, including Ex Libris Primo, ProQuest Summon, EBSCO Discovery Service, and OCLC WorldCat Discovery Service. An update on the Open Discovery Initiative and the recent movement toward more participation by content providers. Discuss technologies better suited for public libraries that have more concerns for customer engagement, access to print and electronic books, with less stringent requirements for article-level discovery of scholarly resources. The role of open source discovery interfaces such as VuFind and Blacklight. The status of the new generation of library services platforms. The presentation will provide highlights of global library automation industry, with a focus on the players and trends in Latin America Based on “Library Systems Report 2014” http://www.americanlibrariesmagazine.org/article/library-systems-report-2014
presentation for a workshop on cataloging medieval manuscripts with Debra Cashion, Sheila Bair and Sue Steuer which was held at the Rare Book and Manuscript Section (RBMS) of the Association of College and Research Libraries (ACRL) in Minneapolis, MN on June 27, 2013.
Ancient Wisdom online: towards a Digital Library of Ancient Greek and Roman inscriptions
Dr. Sorin Hermon, the Cyprus Institute
pdf file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il
Dissertation as a document provides data on new knowledge, but also – encodes important scientometrical information. A study of social structure of science through data found in dissertations and theses provides bibliometrical data for study of national style of science. The pilot study, described below encourages the library community to improve their documentation in this area, in particular, the notation of supervisors and institutions within a bibliographical record. It is proposed that the CBD argue for stricter standards of library/archive record of dissertation.
Along with the dissertation data in the new IsisCB platform the social structure of the history of science community might be analysed. Scientometrical study of dissertation abstracts at a local level (ie., Lithuania) will provide a model for future studies of scholarly communication at global level.
Towards OpenURL Quality Metrics: Initial Findingsalc28
Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: <http: />. Delivered at the July 2009 American Library Association conference in Chicago.
Ancient Wisdom online: towards a Digital Library of Ancient Greek and Roman inscriptions
Dr. Sorin Hermon, the Cyprus Institute
ppt file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il
Russian Learner Translator Corpus: design, research potential and applications
17th International Conference on Text, Speech and Dialogue Brno, Czech Republic, September 8–12 2014
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...The Magnes
For many years, institutions housing archive, library, and museum (ALM) collections have catalogued their holdings either into specialized software for each area types, or have forced their collection records into systems which do not take into account the cataloging standards of the field those materials represent.
New technology now allows us to rethink the way ALM assets can be catalogued, not only according to professional standards, but in a collaborative environment. Bridging the ALM Divide demonstrates how cataloguing into a collaborative environment enhances research, protects collections, increases access and interpretation and secures professional standards.
This presentation was prepared by Francesco Spagnolo, PhD, Director of Research and Collections, and Perian Sully, Collection Information Manager at The Magnes.
Bridging The ALM Divide - An Integrated Archive-Library-Museum Approach for H...Perian Sully
This session presented a case study of how the Magnes is using new, innovative technology to make their archive, library, and museum information compatible in one centralized collection management system. This project is designed to facilitate research, sharing, and collection management by making all material types searchable and linkable to one another, while using shared metadata sets (eg. some of the Magnes’ museum fields are able to be exported to EAD and MARC; some MARC-enabled fields are also exportable into EAD. Many fields are also ISAD-compliant and the Magnes also used CDWA for some data-entry standards and Getty’s ULAN for the creators and artists).
Slides of the presentations gives as part of the Europeana Research panel "Cultural Heritage Data for Research: A Europeana Research Panel" at DH Benelux 2017 in Utrecht.
Enabling Accessible Resource Access via Service ProvidersAlexander Haffner
Libraries have become digitized and are using information technology for storing and managing their
resource inventory. Additional metadata are used for describing properties of non-digital assets as well
as of purely digital resources. In particular, as the amount of digital resources increases, there is
demand for centralized services for searching and distribution of content.
Stakeholders of libraries and publishing industry already have made progress in areas of archival
strategies and standardization of preservation strategies. A variety of metadata standards and
exchange protocols enable service providers to offer a single access point to resources. However,
there is still an increased demand for improvement of resource organization and enhanced quality
particularly in terms of accessibility. This paper presents strategies to increase accessibility of
resources as a valuable step towards access-for-all. For consideration of accessibility within the
publishing chain we analyse the whole processing chain and identify stakeholders such as national
libraries and their corresponding responsibilities for ingest, archival storage and dissemination of
digital resources.
A presentation on resource sharing and networking by Dr. Keshava, Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Karnataka, India.
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...locloud
This presentation provides an introduction to thesaurus management in the LoCloud Vocabulary Services given during the LoCloud training workshops. It provides an introduction to controlled vocabularies, thesaurus for information retrieval and interoperability, to SKOS, multilingual vocabulary issues and to the federated model adopted for thesaurus management within the LoCloud service, which is based in TemaTres. The presentation includes a list of the vocabularies that have been integrated within the LoCloud service. There is also a walk-through of MediaThread and how this was used in the vocabulary management training offered in the workshop.
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...innovatics
Explora el ámbito de los servicios de descubrimiento basados en índices, orientado al ámbito de las bibliotecas académicas, incluyendo Primo de Ex Libris, Summon de ProQuest, Discovery Service de Ebsco y Discovery Service de OCLC WorldCat.
Se aborda la Iniciativa Open Discovery y la reciente tendencia hacia una mayor participación por parte de los proveedores de contenidos. Se discute acerca de las tecnologías más adecuadas para las bibliotecas que tienen mayor preocupación por la participación del usuario, sobre el acceso a los libros impresos y electrónicos, con menos restricciones para los artículos académicos que se encuentran en Descubrimiento. Se presenta el papel de las interfaces de descubrimiento de código abierto tales como VuFind y Blacklight. Se aborda el estado de la nueva generación de plataformas de servicios de la biblioteca. La presentación ofrecerá los aspectos más destacados de la industria de automatización de la biblioteca global, con especial atención a los protagonistas y tendencias en América Latina. Basado en el "Informe 2014 de los Sistemas de Bibliotecas" http://www.americanlibrariesmagazine.org/article/library-systems-report-2014
Abstract
Discovery, delivery, and management: the current wave of new library technologies and industry trends
Explore the realm of index-based discovery services oriented more to academic libraries, including Ex Libris Primo, ProQuest Summon, EBSCO Discovery Service, and OCLC WorldCat Discovery Service. An update on the Open Discovery Initiative and the recent movement toward more participation by content providers. Discuss technologies better suited for public libraries that have more concerns for customer engagement, access to print and electronic books, with less stringent requirements for article-level discovery of scholarly resources. The role of open source discovery interfaces such as VuFind and Blacklight. The status of the new generation of library services platforms. The presentation will provide highlights of global library automation industry, with a focus on the players and trends in Latin America Based on “Library Systems Report 2014” http://www.americanlibrariesmagazine.org/article/library-systems-report-2014
presentation for a workshop on cataloging medieval manuscripts with Debra Cashion, Sheila Bair and Sue Steuer which was held at the Rare Book and Manuscript Section (RBMS) of the Association of College and Research Libraries (ACRL) in Minneapolis, MN on June 27, 2013.
Ancient Wisdom online: towards a Digital Library of Ancient Greek and Roman inscriptions
Dr. Sorin Hermon, the Cyprus Institute
pdf file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il
Dissertation as a document provides data on new knowledge, but also – encodes important scientometrical information. A study of social structure of science through data found in dissertations and theses provides bibliometrical data for study of national style of science. The pilot study, described below encourages the library community to improve their documentation in this area, in particular, the notation of supervisors and institutions within a bibliographical record. It is proposed that the CBD argue for stricter standards of library/archive record of dissertation.
Along with the dissertation data in the new IsisCB platform the social structure of the history of science community might be analysed. Scientometrical study of dissertation abstracts at a local level (ie., Lithuania) will provide a model for future studies of scholarly communication at global level.
Towards OpenURL Quality Metrics: Initial Findingsalc28
Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: <http: />. Delivered at the July 2009 American Library Association conference in Chicago.
Ancient Wisdom online: towards a Digital Library of Ancient Greek and Roman inscriptions
Dr. Sorin Hermon, the Cyprus Institute
ppt file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il
Russian Learner Translator Corpus: design, research potential and applications
17th International Conference on Text, Speech and Dialogue Brno, Czech Republic, September 8–12 2014
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...The Magnes
For many years, institutions housing archive, library, and museum (ALM) collections have catalogued their holdings either into specialized software for each area types, or have forced their collection records into systems which do not take into account the cataloging standards of the field those materials represent.
New technology now allows us to rethink the way ALM assets can be catalogued, not only according to professional standards, but in a collaborative environment. Bridging the ALM Divide demonstrates how cataloguing into a collaborative environment enhances research, protects collections, increases access and interpretation and secures professional standards.
This presentation was prepared by Francesco Spagnolo, PhD, Director of Research and Collections, and Perian Sully, Collection Information Manager at The Magnes.
Bridging The ALM Divide - An Integrated Archive-Library-Museum Approach for H...Perian Sully
This session presented a case study of how the Magnes is using new, innovative technology to make their archive, library, and museum information compatible in one centralized collection management system. This project is designed to facilitate research, sharing, and collection management by making all material types searchable and linkable to one another, while using shared metadata sets (eg. some of the Magnes’ museum fields are able to be exported to EAD and MARC; some MARC-enabled fields are also exportable into EAD. Many fields are also ISAD-compliant and the Magnes also used CDWA for some data-entry standards and Getty’s ULAN for the creators and artists).
Slides of the presentations gives as part of the Europeana Research panel "Cultural Heritage Data for Research: A Europeana Research Panel" at DH Benelux 2017 in Utrecht.
Enabling Accessible Resource Access via Service ProvidersAlexander Haffner
Libraries have become digitized and are using information technology for storing and managing their
resource inventory. Additional metadata are used for describing properties of non-digital assets as well
as of purely digital resources. In particular, as the amount of digital resources increases, there is
demand for centralized services for searching and distribution of content.
Stakeholders of libraries and publishing industry already have made progress in areas of archival
strategies and standardization of preservation strategies. A variety of metadata standards and
exchange protocols enable service providers to offer a single access point to resources. However,
there is still an increased demand for improvement of resource organization and enhanced quality
particularly in terms of accessibility. This paper presents strategies to increase accessibility of
resources as a valuable step towards access-for-all. For consideration of accessibility within the
publishing chain we analyse the whole processing chain and identify stakeholders such as national
libraries and their corresponding responsibilities for ingest, archival storage and dissemination of
digital resources.
A presentation on resource sharing and networking by Dr. Keshava, Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Karnataka, India.
Slides of the paper Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench by Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje at the 3rd Edition of the DATeCH2019 International Conference
Similar to The Significance of Transliteration Enhancing (20)
برمجيات استكشاف المعلومات والبحث الموحد Discovery Toolsبالتطبيق على برنامج V...mohamed Elzalabany
ورشة عمل مقدمة ضمن المؤتمر العلمي الثالث لقسم علوم المعلومات جامعة بني سويف
بعنوان اقتصاد المعرفة والتنمية الشاملة للمجتمعات : الفرص والتحديات
10-11 أكتوبر 2017
أخر تحديثات نظام D-Space ضمن ورشة عمل مدخل إلى برنامج إدارة المستودعات الرقمية المقامه على هامش مؤتمر التحول إلى مجتمع المعرفة: رؤى معلوماتية : جامعة بني سويف 12-4-2016 (الجزء الأول)
يقدم الندوة أ.محمد الزلباني : مدير مؤسسة الرؤية المصرية الأولى لنظم المكتبات وتقنيات المعلومات
ورشة عمل مدخل إلى برنامج إدارة المستودعات الرقمية المقامه على هامش مؤتمر التحول إلى مجتمع المعرفة: رؤى معلوماتية : جامعة بني سويف 12-4-2016 (الجزء الأول)
يقدم الندوة أ.محمد الزلباني : مدير مؤسسة الرؤية المصرية الأولى لنظم المكتبات وتقنيات المعلومات
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
1. The Significance of Transliteration in
Enhancing Accessibility and Exchange of
Arabic Research Data
Mohamed El-Zalabany M.A.
Cataloging Librarian - Library of Congress – Cairo Office
Ph.D Candidate, Information Studies- Cairo University
mh_zalabany@hotmail.com
www.linkedin.com/in/mohamedelzalabany
2. Research Data
on Arabic Content
Research data is: “the recorded factual
material commonly accepted in the scientific
community as necessary to validate research
findings."
Research data on Arabic content, heritage,
and culture is essential for global research,
offering unique insights that enrich our
understanding of societal dynamics, historical
evolution, and cultural diversity, thereby
promoting international collaboration and
understanding.
3. Unique examples of
Arabic Research data
Historical Texts: Analysis of ancient manuscripts
from significant libraries.
Linguistic Data: Studies on dialects and oral
histories.
Archaeological Findings: Discoveries from
historical sites.
Cultural Practices: Data on traditional ceremonies
and rituals.
Art and Iconography: Insights from regional art
and architecture.
Folklore and Music: Collections documenting
cultural expressions.
Socio-economic Data: Research on social and
economic patterns.
Cinema Analysis: Trends and themes in Arabic
media.
4. Towards FAIR Arabic
Research Data
The FAIR principles ensure that data is
Findable, Accessible, Interoperable, and
Reusable, promoting open and efficient
sharing of digital research outputs.
Arabic Open Research Data has a very
important role in Digital Humanities.
5. Metadata Role
in RD Stewardship
Effective research data stewardship
requires robust depositing practices and
detailed metadata management to ensure
data is both discoverable and usable for
future research.
Repository
Metadata
Research
Data
6. Challenges of Arabic
RD Metadata
• Original Description Preservation: It’s essential to
keep original Arabic descriptions to maintain cultural
and historical accuracy, but this can be complex due to
dialect variations and the intricacies of the script.
• Metadata Translation: Translating Arabic metadata into
other languages poses risks of misinterpretation or
oversimplification, potentially leading to critical
information loss.
• Global Accessibility: Balancing local specifics with
global understandability in metadata is challenging,
requiring adjustments to ensure clarity for an
international audience.
• Technical Integration: Incorporating Arabic metadata
into predominantly Latin-script systems involves
overcoming right-to-left text orientation and script
cursive nature.
• Metadata Standards Harmonization: Developing
standards that accommodate Arabic specifics while
aligning with international norms is complex, especially
with differing transliteration schemes and cataloging
practices.
7. Multi-lingual and
transliterated Metadata
Essential for discovery
• valuable research is often overlooked when
published in non-English languages. It can also
be helpful for researchers trying to determine
whether it is worth their time to request a
translation or use a machine translator to read an
article, especially for fields and topics where
research is scarce.
• It can be frustrating for readers to find translated
metadata only to find that the actual content is
not available in that language.
• It can also cause your content to be indexed
incorrectly or not at all, especially by indexing
and discovery services that have strict
expectations for metadata and language
consistency.
14. Research data Repository systems
• Current situation of software to support transliterated data
• Supporting the right-to-left
• Unicode
• Indexing and Searching
18. Automatic Transliteration challenge
• current practices of (ALA-DIN 31635-ISO 233)
• Transliteration from Roman to Arabic (tools)
• Transliteration from Arabic to Roman (challenges, tools)
22. Recommended Practices to enhance the
transliterated metadata
• Develop new metadata templates in the research data
repository to have fields with more than one language
• Use appropriate language fields
• Avoid mixing languages in a single field
• Aim for consistency between metadata fields and galleys
• Avoid unnecessary transliteration/romanization