Presented by Jennifer Hecker and Elizabeth Grumbach and hosted by the Texas Consortium on Digital Humanities, these are the slides for the TXDHC training webcast on OpenRefine, February 12th, 2015.
This document provides an overview of OpenRefine, an open source tool for working with messy data. It discusses key features of OpenRefine including importing various data formats, exploring and transforming data through functions like text filtering and regular expressions, linking data to external sources, and exporting cleaned data. The document also outlines the steps to install OpenRefine and provides a tutorial on basic and advanced data cleaning operations.
Central Pennsylvania Open Source Conference, October 17, 2015
Data is a hot topic in the tech sector with big data, data processing, data science, linked open data and data visualization to name only a few examples. Before data can be processed or analyzed it often has to be cleaned. OpenRefine is an open source interactive data transformation tool for working with messy data. This presentation will begin with a short overview of the features of OpenRefine. To demonstrate basic concepts of data cleaning, manipulating, faceting and filtering with OpenRefine, Pennsylvania Heritage magazine subject index data will be used as a case study.
OpenRefine is a data wrangling tool originally developed by Google that is now open-sourced. It is designed to help data scientists clean and understand messy data. Some key capabilities of OpenRefine include importing/cleaning data, analyzing inconsistent data types through facets, and clustering similar string values. While it is not suitable for very large datasets, OpenRefine is useful for cleaning messy data that requires multiple cleaning steps per column, such as dealing with typos, inconsistent values, and mixed data types. It allows users to interactively clean and transform data without programming.
- OpenRefine is a free, open source tool for cleaning and transforming messy data.
- It allows users to work with data in a visual, interactive way through facets, clustering, and scripting capabilities like GREL.
- Common tasks include cleaning data, transforming formats, reconciling concepts with external sources, and cross-referencing data.
Presented by Rachel Tillay and Mike Waugh, LSU Libraries
Is your data running loose in your library? OpenRefine is a tool that can help libraries more easily view, analyze, clean, and match large data sets. It is particularly useful for digital projects, statistics, or user data. This presentation will describe how OpenRefine is different from spreadsheets, datasheets, and programming. It will also include demonstrations of some of the most useful functions in OpenRefine, compatible tools, and solutions it provides to would-be data wranglers. Examples will include real-life problems that LSU Libraries has encountered in its cataloging and digital projects.
The document discusses generating high quality Linked Open Data using the RDF Mapping Language (RML). RML allows for the uniform and declarative generation of RDF from heterogeneous data sources through mapping rules. It supports assessing mapping quality to identify issues before data is generated. Metadata can also be automatically generated from the mappings. The document emphasizes that non-technical data specialists should be able to easily edit the mappings over time.
This document discusses OpenRefine reconciliation services, which allow text matching of items in a dataset to items in a separate dataset or database. It provides links to general documentation on OpenRefine reconciliation as well as lists of available reconciliation services and sources that can be reconciled with OpenRefine. The presentation focuses on demonstrating the reconciliation feature of OpenRefine for data review and cleaning.
This document provides an overview of OpenRefine, an open source tool for working with messy data. It discusses key features of OpenRefine including importing various data formats, exploring and transforming data through functions like text filtering and regular expressions, linking data to external sources, and exporting cleaned data. The document also outlines the steps to install OpenRefine and provides a tutorial on basic and advanced data cleaning operations.
Central Pennsylvania Open Source Conference, October 17, 2015
Data is a hot topic in the tech sector with big data, data processing, data science, linked open data and data visualization to name only a few examples. Before data can be processed or analyzed it often has to be cleaned. OpenRefine is an open source interactive data transformation tool for working with messy data. This presentation will begin with a short overview of the features of OpenRefine. To demonstrate basic concepts of data cleaning, manipulating, faceting and filtering with OpenRefine, Pennsylvania Heritage magazine subject index data will be used as a case study.
OpenRefine is a data wrangling tool originally developed by Google that is now open-sourced. It is designed to help data scientists clean and understand messy data. Some key capabilities of OpenRefine include importing/cleaning data, analyzing inconsistent data types through facets, and clustering similar string values. While it is not suitable for very large datasets, OpenRefine is useful for cleaning messy data that requires multiple cleaning steps per column, such as dealing with typos, inconsistent values, and mixed data types. It allows users to interactively clean and transform data without programming.
- OpenRefine is a free, open source tool for cleaning and transforming messy data.
- It allows users to work with data in a visual, interactive way through facets, clustering, and scripting capabilities like GREL.
- Common tasks include cleaning data, transforming formats, reconciling concepts with external sources, and cross-referencing data.
Presented by Rachel Tillay and Mike Waugh, LSU Libraries
Is your data running loose in your library? OpenRefine is a tool that can help libraries more easily view, analyze, clean, and match large data sets. It is particularly useful for digital projects, statistics, or user data. This presentation will describe how OpenRefine is different from spreadsheets, datasheets, and programming. It will also include demonstrations of some of the most useful functions in OpenRefine, compatible tools, and solutions it provides to would-be data wranglers. Examples will include real-life problems that LSU Libraries has encountered in its cataloging and digital projects.
The document discusses generating high quality Linked Open Data using the RDF Mapping Language (RML). RML allows for the uniform and declarative generation of RDF from heterogeneous data sources through mapping rules. It supports assessing mapping quality to identify issues before data is generated. Metadata can also be automatically generated from the mappings. The document emphasizes that non-technical data specialists should be able to easily edit the mappings over time.
This document discusses OpenRefine reconciliation services, which allow text matching of items in a dataset to items in a separate dataset or database. It provides links to general documentation on OpenRefine reconciliation as well as lists of available reconciliation services and sources that can be reconciled with OpenRefine. The presentation focuses on demonstrating the reconciliation feature of OpenRefine for data review and cleaning.
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
Semantic Web will be the next big thing in the world of internet. This presentation talks about various approaches that can be used to query the underlying triple store that has all the information.
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersSpazioDati
This is the presentation showed during ISWC 2014 at Riva del Garda. The session was titled "Developers Workshop", and the focus was on how you solved practical problems for Linked Data. We presented dandelion platform and our data curation workflow, and the overall idea of dataGEM APIs.
This is an informal overview of Linked Data and the usage made of it for the project http://res.space (presented on August 11th 2016 during a team meeting)
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...Michael Cummings
Providing Voyager catalog data in a custom, open source web application, "Launchpad" outlines the features of customized library catalog software application from the George Washington University.
This document discusses various approaches for building applications that consume linked data from multiple datasets on the web. It describes characteristics of linked data applications and generic applications like linked data browsers and search engines. It also covers domain-specific applications, faceted browsers, SPARQL endpoints, and techniques for accessing and querying linked data including follow-up queries, querying local caches, crawling data, federated query processing, and on-the-fly dereferencing of URIs. The advantages and disadvantages of each technique are discussed.
Semantic pipes aggregate data from multiple sources to create new data sources, similar to Yahoo! Pipes. Semantic pipes operate on RDF data sources using SPARQL queries. DERI Pipes is a tool for building semantic pipes that defines blocks for processing RDF and other data sources. Semantic mashups may have additional reasoning capabilities beyond basic data aggregation, using semantic web reasoners. They implement behavior through SPARQL queries over RDF data. Examples include mashups over Flickr, book data, and scholarly references.
This document discusses how linked data and XML can be integrated using tools like XSLT and Apache Jena. It provides examples of converting an XML table and XLIFF file to the Turtle format. Methods for querying linked data via SPARQL from within XSLT are also presented.
Emerging technologies in academic libraries. A department by department overview. Data visualization, online reference, nextGen library platforms, open source software, digital asset and archive management systems, digital humanities, scientific and creative software, new physical spaces for libraries.
The document discusses linking XML data to the web of linked data. It provides examples of converting XML content like tables and files into linked data formats like Turtle and JSON-LD. It also demonstrates querying linked data from XML files using SPARQL and XSLT transformations and serving linked data from XML using Apache Jena Fuseki. The document aims to help integrate linked data processing into existing XML tooling and workflows.
The document discusses how data models and technologies change over time, but constants are needed to maintain meaning. It describes how different data models like tables, databases and XML each deal differently with changes. While the RDF model is flexible, changes in data or schemas still require changes in identifiers and symbols used in machine interfaces. To maintain meaning, unique identifiers, ontologies and resources need to remain constant as technologies and models evolve. Promises are needed to ensure these constants endure despite changes.
This document describes a web service that analyzes web crawl data to provide contextual information about locations. It extracts topics like weather, healthcare, crime, and employment that are relevant to a given location from common crawl data stored on Amazon S3. The system uses Apache Pig on a Hadoop cluster to analyze the data, builds an index of locations to associated words, and makes the results searchable through Elastic Search. It aims to provide useful information to people moving to new places, policy makers, journalists, and researchers.
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsRuben Verborgh
The document summarizes an analysis of the usage of DBpedia's Triple Pattern Fragments interface between November 2014 and February 2015. Over 4 million requests were made to the interface with 99.9994% uptime. The top clients were the TPF client library, crawlers and Chrome browser. Most requests came from Europe, US and China. The analysis found the interface provided highly available querying of DBpedia's data but more work is needed to understand specific queries and build applications for end users.
Apache Stanbol and the Web of Data - ApacheCon 2011Nuxeo
Presentation on Apache Stanbol (incubating) and related projects given by Olivier Grisel durin ApacheCon 2011.
More information:
- http://incubator.apache.org/stanbol/
- http://www.iks-project.eu
The slides for my presentation on BIG DATA EN LAS ESTADÍSTICAS OFICIALES - ECONOMÍA DIGITAL Y EL DESARROLLO, 2019 in Colombia. I was invited to give a talk about the technical aspect of web-scraping and data collection for online resources.
Web scraping is mostly about parsing and normalization. This presentation introduces people to harvesting methods and tools as well as handy utilities for extracting and normalizing data
Slides for lightning talk, providing short introduction to the Semantic Web, and outlining current initiatives to work with semantic web technologies within the Ruby on Rails framework.
Tapping into Scientific Data with Hadoop and FlinkMichael Häusler
At ResearchGate, we constantly analyze scientific data to connect the world of science and make research open to all. It can be tricky to set up a process to continuously deliver improved versions of algorithms that tap into more than 100 million publications and corresponding bibliographic metadata. In this talk, we illustrate some (big) data engineering challenges of running data pipelines and incorporating results into the live databases that power our user-facing features every day. We show how Apache Flink helps us to improve performance, robustness, ease of maintenance - and most importantly - have more fun while building big data pipelines.
The document discusses Dataverse, a platform for sharing research data and code. It provides links to information on importing code from GitHub, testing usability of code deposit in Dataverse, and standards for describing software metadata and citations. The document also mentions a project using Code Ocean to enable reproducibility through published code and notebooks, and links to information on verification workflows and tools like Whole Tale that support reproducibility. It concludes with questions about user preferences for versioning of code deposited in Dataverse from GitHub.
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
Semantic Web will be the next big thing in the world of internet. This presentation talks about various approaches that can be used to query the underlying triple store that has all the information.
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersSpazioDati
This is the presentation showed during ISWC 2014 at Riva del Garda. The session was titled "Developers Workshop", and the focus was on how you solved practical problems for Linked Data. We presented dandelion platform and our data curation workflow, and the overall idea of dataGEM APIs.
This is an informal overview of Linked Data and the usage made of it for the project http://res.space (presented on August 11th 2016 during a team meeting)
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...Michael Cummings
Providing Voyager catalog data in a custom, open source web application, "Launchpad" outlines the features of customized library catalog software application from the George Washington University.
This document discusses various approaches for building applications that consume linked data from multiple datasets on the web. It describes characteristics of linked data applications and generic applications like linked data browsers and search engines. It also covers domain-specific applications, faceted browsers, SPARQL endpoints, and techniques for accessing and querying linked data including follow-up queries, querying local caches, crawling data, federated query processing, and on-the-fly dereferencing of URIs. The advantages and disadvantages of each technique are discussed.
Semantic pipes aggregate data from multiple sources to create new data sources, similar to Yahoo! Pipes. Semantic pipes operate on RDF data sources using SPARQL queries. DERI Pipes is a tool for building semantic pipes that defines blocks for processing RDF and other data sources. Semantic mashups may have additional reasoning capabilities beyond basic data aggregation, using semantic web reasoners. They implement behavior through SPARQL queries over RDF data. Examples include mashups over Flickr, book data, and scholarly references.
This document discusses how linked data and XML can be integrated using tools like XSLT and Apache Jena. It provides examples of converting an XML table and XLIFF file to the Turtle format. Methods for querying linked data via SPARQL from within XSLT are also presented.
Emerging technologies in academic libraries. A department by department overview. Data visualization, online reference, nextGen library platforms, open source software, digital asset and archive management systems, digital humanities, scientific and creative software, new physical spaces for libraries.
The document discusses linking XML data to the web of linked data. It provides examples of converting XML content like tables and files into linked data formats like Turtle and JSON-LD. It also demonstrates querying linked data from XML files using SPARQL and XSLT transformations and serving linked data from XML using Apache Jena Fuseki. The document aims to help integrate linked data processing into existing XML tooling and workflows.
The document discusses how data models and technologies change over time, but constants are needed to maintain meaning. It describes how different data models like tables, databases and XML each deal differently with changes. While the RDF model is flexible, changes in data or schemas still require changes in identifiers and symbols used in machine interfaces. To maintain meaning, unique identifiers, ontologies and resources need to remain constant as technologies and models evolve. Promises are needed to ensure these constants endure despite changes.
This document describes a web service that analyzes web crawl data to provide contextual information about locations. It extracts topics like weather, healthcare, crime, and employment that are relevant to a given location from common crawl data stored on Amazon S3. The system uses Apache Pig on a Hadoop cluster to analyze the data, builds an index of locations to associated words, and makes the results searchable through Elastic Search. It aims to provide useful information to people moving to new places, policy makers, journalists, and researchers.
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsRuben Verborgh
The document summarizes an analysis of the usage of DBpedia's Triple Pattern Fragments interface between November 2014 and February 2015. Over 4 million requests were made to the interface with 99.9994% uptime. The top clients were the TPF client library, crawlers and Chrome browser. Most requests came from Europe, US and China. The analysis found the interface provided highly available querying of DBpedia's data but more work is needed to understand specific queries and build applications for end users.
Apache Stanbol and the Web of Data - ApacheCon 2011Nuxeo
Presentation on Apache Stanbol (incubating) and related projects given by Olivier Grisel durin ApacheCon 2011.
More information:
- http://incubator.apache.org/stanbol/
- http://www.iks-project.eu
The slides for my presentation on BIG DATA EN LAS ESTADÍSTICAS OFICIALES - ECONOMÍA DIGITAL Y EL DESARROLLO, 2019 in Colombia. I was invited to give a talk about the technical aspect of web-scraping and data collection for online resources.
Web scraping is mostly about parsing and normalization. This presentation introduces people to harvesting methods and tools as well as handy utilities for extracting and normalizing data
Slides for lightning talk, providing short introduction to the Semantic Web, and outlining current initiatives to work with semantic web technologies within the Ruby on Rails framework.
Tapping into Scientific Data with Hadoop and FlinkMichael Häusler
At ResearchGate, we constantly analyze scientific data to connect the world of science and make research open to all. It can be tricky to set up a process to continuously deliver improved versions of algorithms that tap into more than 100 million publications and corresponding bibliographic metadata. In this talk, we illustrate some (big) data engineering challenges of running data pipelines and incorporating results into the live databases that power our user-facing features every day. We show how Apache Flink helps us to improve performance, robustness, ease of maintenance - and most importantly - have more fun while building big data pipelines.
The document discusses Dataverse, a platform for sharing research data and code. It provides links to information on importing code from GitHub, testing usability of code deposit in Dataverse, and standards for describing software metadata and citations. The document also mentions a project using Code Ocean to enable reproducibility through published code and notebooks, and links to information on verification workflows and tools like Whole Tale that support reproducibility. It concludes with questions about user preferences for versioning of code deposited in Dataverse from GitHub.
The document discusses flexible resources in Eclipse 4.0 and 3.6. It describes how work being done in Eclipse 4.0 on resources can be merged into the 3.x code stream. It provides examples of new resource functionality that originated in Eclipse 4.0 but has been backported to 3.6, such as resource filters and virtual folders. It also outlines plans for a semantic file system in Eclipse to provide more flexible mapping between workspace paths and network locations.
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaJazz Yao-Tsung Wang
This document discusses building a personal search engine using Crawlzilla. It begins with introductions from Jazz Wang, a speaker from NCHC Taiwan who is a co-founder of Hadoop.TW. It then provides an overview of Crawlzilla, a cluster-based web crawler and search engine that supports Chinese word segmentation and multiple users/indexes. The document demonstrates how to use Crawlzilla through a multi-step process of registering for an account, receiving an acceptance notification, and then logging in to access the search functionality.
This document discusses open data and big data. It defines open data as publicly available data that anyone can access, use and share, while big data refers to large and complex datasets. Open data could grow into big data as more data is published. The document provides an overview of linked open data standards and technologies like RDF and SPARQL that help structure and connect open datasets. It emphasizes the importance of not just publishing data but also developing applications that make use of open data through hackathons and other initiatives.
The document provides an introduction to the Semantic Web, including:
- The Semantic Web extends the current web by giving information well-defined meaning so computers and people can better work together.
- It aims to make data easier for machines to publish, share, find and understand through smarter data rather than just smarter machines.
- Examples of Semantic Web applications include Bio2RDF, which provides structured data about genes, and the BBC publishing semantic metadata about musical artists.
This document discusses the Linked Data Platform (LDP) and how following W3C recommendations can help organizations publish and maintain data in a more efficient way. It provides an overview of key W3C standards like RDF and the LDP. The LDP allows data to be accessed, updated, created and deleted using HTTP requests. Adopting LDP processes like producing data in RDF and publishing it online can streamline data maintenance versus traditional methods that require data to be exported in multiple formats and sent to others. The presentation argues LDP can reduce organizations' data publishing and updating workloads.
Build Secure Cloud-Hosted Apps for SharePoint 2013Danny Jessee
Apps for SharePoint were introduced in SharePoint 2013 to maximize the level of capability and flexibility that developers can deliver without risking compromise to the farm. In this session, we will delve into apps that leverage resources running outside the SharePoint farm—whether in another on-premises web server or in the cloud. We will use server-side and client-side code to demonstrate how cloud-hosted apps can securely access data stored in SharePoint using the client object model (CSOM/JSOM) and REST APIs, along with the pros and cons associated with each approach. We will discuss the various permissions models associated with apps for SharePoint including types of app permissions, permission request scopes, and how app developers can manage permissions. We will conclude by building and provisioning a provider-hosted app for SharePoint to Office 365.
Semantic Result Formats: Automatically Transforming Structured Data into usef...Hans-Joerg Happel
Semantic MediaWiki (SMW) is an extension which allows to store and query structured data in MediaWiki. While SMW provides core functionality to render results of such queries (e.g. as tables and lists) extensions may hook in to support further output formats. These so-called "result formats" are thus powerful tools to leverage semantic wiki data for user-friendly visualization such as timelines and graphs or exporting structured data into standardized data exchange formats. A set of popular formats has been bundled as the Semantic Result Formats extension.
The power of Semantic Result Formats (SRF) is caused by the fact that it allows for many different ways of reusing data stored in SMW. Examples are visualizations (which can even use libraries and tools such as GraphViz or SIMILE Timeline) and data exports like vCard and iCalendar. Furthermore, both of these options can include responses from external web services like the Google charts API into a wiki page.
In this tutorial we want to explain the general design and implementation of the query result processing in order to enable users to improve existing and develop novel result formats. We will therefore start with a brief overview of existing result formats to explain the overall concept. Afterwards we introduce some implementation details and guide participants to write a simple "Hello World" result format implementation.
MyExperiment è un social network all’interno del quale è possibile cercare flussi di lavoro scientifico resi pubblici, ma anche proporre, condividere e svilupparne di nuovi, al fine di creare delle comunità e sviluppare relazioni. La presentazione illustra la visione di My Experiment sull' Open Science
Dave de Roure - The myExperiment approach towards Open Scienceshwu
Dave de Roure's talk on myExperiment, including thoughts on protocol and workflow sharing and online communities. Presented at the Open Science workshop at the Pacific Symposium on Biocomputing, January 5th, 2009
How the Web can change social science research (including yours)Frank van Harmelen
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.
Discusses information visualization, common data models, tools, and how information architecture fits into the picture. Includes many links to additional resources.
Part of the Technology and Content Professional Development Day.
These are the slides that supported the presentation on Apache Flink at the ApacheCon Budapest.
Apache Flink is a platform for efficient, distributed, general-purpose data processing.
This document provides an introduction to the Semantic Web and RDF (Resource Description Framework). It discusses how the Semantic Web aims to extend the current web by giving data well-defined meaning to enable computers and people to better work together. It introduces RDF as a standard for representing information in the Semantic Web and provides examples of how RDF can be used to represent different types of data, such as relational data and evolving data scenarios.
O365Con18 - Reach for the Cloud Build Solutions with the Power of Microsoft G...NCCOMMS
This document discusses Microsoft Graph and how it provides a unified REST API for accessing data and intelligence from Microsoft services like Office 365. It defines what a graph is, describes the benefits of Microsoft Graph over individual service APIs, demonstrates how to make requests to the Graph API via REST calls and language-specific SDKs, and provides resources for further information.
Airbnb aims to democratize data within the company by building a graph database of all internal data resources connected by relationships. This graph is queried through a search interface to help employees explore, discover, and build trust in company data. Challenges include modeling complex data dependencies and proxy nodes, merging graph updates from different sources, and designing a data-dense interface simply. Future goals are to gamify content production, deliver recommendations, certify trusted content, and analyze the information network.
The document discusses data discovery, conversion, integration and visualization using RDF. It covers topics like ontologies, vocabularies, data catalogs, converting different data formats to RDF including CSV, XML and relational databases. It also discusses federated SPARQL queries to integrate data from multiple sources and different techniques for visualizing linked data including analyzing relationships, events, and multidimensional data.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
5. Cleaning up data that is:
in a simple tabular format
is inconsistently formatted
has inconsistent terminology
6. get an overview of a data set
resolve inconsistencies
split data up into more granular parts
match local data up to other data sets
enhance a data set with data from
other sources
14. …ask some questions about your data set:
What type of data is it & what format is it in?
What’s the size of your data set?
What question do you want to ask your data?
What do you need to do to find the answer?
15. Excel
familiarity, better for data entry, cut and paste
operation, no paging to navigate
Google Spreadsheets
similar to Excel, can get external data
relatively easily, easy to collaborate and share
Google Fusion Tables if you just want to filter, easy to share
Text editor powerful text editor can do many things
Unix tools
more challenging to use, but quick and some
things (finding things, sorting) are easy
Writing code most sophisticated and most to learn!
19. Retrieve data from online sources
example: use names to retrieve birth/death dates
from Virtual International Authority File (VIAF)
Match data to external data sources using
Extensions for RDF, DBpedia, Named-Entity
Recognition (NER), etc…
And ‘reconciliation’ services
20. Use ‘cross’ function to compare
contents of two Refine projects, or
share data between the two projects.
21. TxDHC blog post on this webinar http://www.txdhc.org/txdhc-training-
webcast-materials/
The OpenRefine Wiki https://github.com/OpenRefine/OpenRefine/wiki
OpenRefine User Documentation
https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users
The ‘Free your metadata’ site http://freeyourmetadata.org...
…and book http://book.freeyourmetadata.org
The OpenRefine mailing list and forum
http://groups.google.com/d/forum/openrefine
23. credits * acknowledgements * citations
These slides were developed by Jennifer Hecker (j.hecker@Austin.utexas.edu) and Liz Grumbach (egrumbac@tamu.edu )
on behalf of University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, Media and Culture, and the Texas
Digital Humanities Consortium using many resources including the wonderful course material developed by Owen
Stephens on behalf of the British Library (http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-
using-openrefine/).
Unless otherwise stated, all images, audio or video content are separate works with their own license, and should not be
assumed to be CC-BY in their own right. This work is licensed under a Creative Commons Attribution 4.0 International
License http://creativecommons.org/licenses/by/4.0/. It is suggested when crediting this work, you include the phrase
“Developed by Liz Grumback and Jennifer Hecker on behalf of the university of Texas, Texas A&M, and the TXDHC.”
Thanks to University of Texas Libraries, Texas A&M’s Initiative for Digital Humanities, and the Texas Digital Humanities
Consortium for facilitating this presentation.
Editor's Notes
Howdy there everybody! Thanks for joining this inaugural webinar from the Texas Digital Humanities Consortium. We are testing out this format for ongoing consortial training use.
This session is being recorded and, you may follow along with these slides, or access the recording, slides and supplementary materials on the Texas Digital Humanities Consortium website. Also on the website is a link to a three-question survey and we would very much appreciate any feedback you are willing to provide. During the webinar Liz and I are going to trade off presenting and chat-window-monitoring duties. Please be patient and cross your fingers for us!
I’m going to introduce today’s presenters very quickly. I’m Jennifer Hecker and I work at the University of Texas Libraries. I specialize in brining my years of experience as an archivist to bear on our digital access challenges. I also work in the digital humanities space, coordinating collaborations and projects with students, faculty and staff all over UT. I also direct the Austin Fanzine Project and do a lot of outreach and mentoring work.
Liz Grumbach works for the Initiative for Digital Humanities, Media, and Culture at Texas A&M as a Research Associate, where she supports faculty, staff, and student Digital Humanities projects and endeavors. She's also the Project Manager for the Advanced Research Consortium and 18thConnect.org, where she organizes peer review, supports the creation of digital editions, and maintains the digital records for all ARC research nodes. She's involved in the management of the Early Modern OCR Project (eMOP), which aims to teach machines how to read early modern fonts and make open source software packages available to other institutions seeking to auto generate transcriptions of large page image data sets.
An open-source tool for working with messy data.
Runs in a browser, but locally – your data don’t leave your machine.
Active development community – people creating extensions – and discussion list.
This is some of the basic stuff you might use Refine for. In a little bit, Liz is going to walk you through these functions. Refine does a lot more, too, but today we’re just going to get your feet wet. I’ll come back after the demo and talk a little bit about some of the more advanced possibilities that you can explore…
Refine lets you
Here’s a slide from a webinar I attended a couple of weeks ago. It’s an example of OpenRefine in action – here being used to normalize data as one step in the workflow of a larger metadata aggregation project. So what does it look like?
Refine let’s you split out data that is in one cell into multiple cells – and vice versa.
Here are some simple examples of what we mean when we talk about “normalizing metadata”. Refine lets you easily batch edit data so that it uniformly adheres to your standards.
Here’s what text faceting looks like. It’s useful for getting an overview of your data. Here’s you can quickly see some inconsistencies you might want to address.
Refine also lets you do something called clustering.
– change slide –
This is my personal favorite part!
Here’s a little bigger view…
Liz will go into more detail during the demo, but basically, Refine groups data according to a number of factors that you can adjust that it thinks is similar so that you can review, modify and batch edit. Faceting and clustering are by far the two functions I tend to use most in Refine.
A little background: In conversation, you’ll probably hear all three of these names for this tool. Nobody calls it Freebase Gridworks any more, but the other three are all common. Google originally developed Refine, but then abandoned the project & it became open source, hence the name OpenRefine. Lots of folks – myself included – take the lazy approach and just call it Refine.
There are a number of tools out there that can help you manipulate data sets in a variety of ways. How do you know which is right for you? First, ask yourself some questions about your data.
Here’s a matrix that can help guide your tool selection. It’s not comprehensive, there are more tools out there for sure (and all these tools do more than the brief description above would imply – for example Google Fusion Tables can be used to geocode location information and automatically generate maps, stuff like that), but these are the most common tools and this gives you an idea of what to expect from each of them…
Ok, now I’m going to attempt to hand over the presentation to Liz, a couple hundred miles to my East.
Ok, so now that you’re all excited about what you can do with Refine, I’m going to quickly run thorough some of the more advanced functions. By using regular expressions, which I’ve seen described as “wildcards on steroids”, you can more finely filter an manipulate your data.
Using those same regular expressions, Refine helps you use GREL, the Open Refine Expression Language, to perform transformations on your data.
Using various community-developed extensions which you can easily select and install, you can retrieve data from online sources such as VIAF, and you can match data to external sources such as Dbpedia.
Thanks for tuning in y’all! We hope this was helpful and we welcome any questions or feedback y’all might have!