David A. Ellis is a researcher who builds interactive data visualizations to support publications in psychology and other fields. Some of his recent work includes developing dynamic data visualizations using Shiny and predicting smartphone operating systems from personality traits. He maintains a website at www.davidaellis.co.uk where he shares his visualizations and references publications in which he discusses using visualizations in digital contexts and other research methods.
Reflections from the Pelagios Commons by Leif Isaksen, Lancaster University. http://commons.pelagios.org. Presentation at the 1st Lancaster Data Conversations 30 January 2017
These are the slides for the Faculty Fellow short talk on October 27, 2016, at the Alan Turing Institute. In this talk, I summarise my approach to missing data analysis, and explain how my work fits into an interdisciplinary context. I will add a link to the YouTube recording once it has been uploaded.
This document introduces social media tools that can be useful for academics, including blogs, Twitter, YouTube, Academia.edu, ResearchGate, LinkedIn, SlideShare, Mendeley, Zotero, Figshare, Eventbrite, and Lanyrd. It discusses the potential benefits of using these tools, such as connecting with others, keeping up to date, and increasing traffic and engagement. However, it also notes potential pitfalls like privacy issues, lack of credibility, and commercialization of content. The document encourages exploring different tools and tracking impact through altmetrics.
This document discusses open educational resources (OERs), which are freely available teaching and learning materials that can be reused and modified. It provides examples of OERs from repositories covering various subjects, as well as interactive simulations. Creative Commons licenses allow creators to retain copyright while permitting others to copy, distribute, and make some uses of their work. The document encourages sharing one's own work as OERs and properly attributing materials using Creative Commons guidance.
Presented to members of the Psychology department as part of the New Tricks Seminar series (February 2016)
• journal metrics using WoS and Scopus
• article level metrics in WoS, Scopus and Google Scholar, and from publishers and the differences in each. Touch on altmetrics.
• author metrics in the above. Touch on Publish or Perish
Tanya Williamson, Academic Liaison Librarian
The document provides an overview of research data management (RDM) and the RDM services that Lancaster University plans to offer. It discusses that RDM involves maintaining and preserving digital research data throughout its lifecycle. It also notes that funder requirements and policies are driving universities to improve RDM practices to ensure long-term access and reuse of research data. Lancaster University plans to offer storage, advocate for RDM, provide training and support, help with data management plans, and collaborate with other universities and groups like N8 on RDM issues.
David A. Ellis is a researcher who builds interactive data visualizations to support publications in psychology and other fields. Some of his recent work includes developing dynamic data visualizations using Shiny and predicting smartphone operating systems from personality traits. He maintains a website at www.davidaellis.co.uk where he shares his visualizations and references publications in which he discusses using visualizations in digital contexts and other research methods.
Reflections from the Pelagios Commons by Leif Isaksen, Lancaster University. http://commons.pelagios.org. Presentation at the 1st Lancaster Data Conversations 30 January 2017
These are the slides for the Faculty Fellow short talk on October 27, 2016, at the Alan Turing Institute. In this talk, I summarise my approach to missing data analysis, and explain how my work fits into an interdisciplinary context. I will add a link to the YouTube recording once it has been uploaded.
This document introduces social media tools that can be useful for academics, including blogs, Twitter, YouTube, Academia.edu, ResearchGate, LinkedIn, SlideShare, Mendeley, Zotero, Figshare, Eventbrite, and Lanyrd. It discusses the potential benefits of using these tools, such as connecting with others, keeping up to date, and increasing traffic and engagement. However, it also notes potential pitfalls like privacy issues, lack of credibility, and commercialization of content. The document encourages exploring different tools and tracking impact through altmetrics.
This document discusses open educational resources (OERs), which are freely available teaching and learning materials that can be reused and modified. It provides examples of OERs from repositories covering various subjects, as well as interactive simulations. Creative Commons licenses allow creators to retain copyright while permitting others to copy, distribute, and make some uses of their work. The document encourages sharing one's own work as OERs and properly attributing materials using Creative Commons guidance.
Presented to members of the Psychology department as part of the New Tricks Seminar series (February 2016)
• journal metrics using WoS and Scopus
• article level metrics in WoS, Scopus and Google Scholar, and from publishers and the differences in each. Touch on altmetrics.
• author metrics in the above. Touch on Publish or Perish
Tanya Williamson, Academic Liaison Librarian
The document provides an overview of research data management (RDM) and the RDM services that Lancaster University plans to offer. It discusses that RDM involves maintaining and preserving digital research data throughout its lifecycle. It also notes that funder requirements and policies are driving universities to improve RDM practices to ensure long-term access and reuse of research data. Lancaster University plans to offer storage, advocate for RDM, provide training and support, help with data management plans, and collaborate with other universities and groups like N8 on RDM issues.
Updated 30/01/2015
This session included discussions around the value of bibliometrics for individual performance management/promotion and the REF.
What are bibliometrics?
Journal metrics
Personal metrics
Article level metrics and altmetrics
The document announces the Donald H. Wulff Diversity Travel Fellowships Program which provides up to $1,200 grants to support travel to the annual POD conference for individuals from underrepresented groups. Eligible applicants include those from racial/ethnic minority groups, underrepresented institutions, or who can contribute to POD's mission of social justice and equity. The deadline to apply for the 2013 conference is May 24th and applications should address the applicant's eligibility and how they and POD would benefit from their attendance. Recipients will be expected to share what they learn at a conference session and participate in assessments of the program.
Loyola University Chicago migrated from their Voyager system to Alma and Primo after 15 years. They had an aggressive 6 month timeline for the project. Key aspects of the project included data migration, system configuration, training staff, and testing the new systems. Functional workshops and weekly calls helped address workflows. The migration involved consolidating data and locations before transfer. Implementation of resource management, acquisitions, and the Primo discovery layer required continual set up and customization.
The document discusses Lancaster University's transition from its legacy library system to a new unified library services platform called Alma. Key points include:
- Lancaster signed a contract with Ex Libris in 2011 to implement Alma to improve efficiency, enhance services, and position the library for the digital environment.
- The implementation involved migrating data from previous systems, configuring Alma's functionality, integrating with other campus systems, and optimizing workflows.
- Initial challenges included slow performance and incomplete integrations, but the library has now established basic workflows and sees potential for future improvements through analytics and community collaborations.
- Moving to a cloud-based system with Ex Libris provides benefits like reduced infrastructure costs and
Newcastle University Library - Pop-up LibraryCILIP PPRG
Newcastle University opened a pop-up library to address space issues in their main libraries during exam periods. They analyzed goals, developed concepts, and designed and marketed the pop-up library. It was successful in providing additional space and received positive feedback from students who appreciated having more room to study. The pop-up library achieved the university's objectives and evidenced the need for more permanent library space.
The first step to successfully handling negativity on the Internet is to identify where it's coming from. National Research Center (NRC) describes the four most common sources of Web negativity faced by local governments and shares a few tips on dealing with it.
Sign up for an upcoming Webinar on this topic at www.n-r-c.com/webinars.
Science & Community Public Engagement Workshopwellcome.trust
Presented by Clare Matterson (Director of Medicine, Society and History (MSH) at the Wellcome Trust) at the Public Engagement Workshop, 2-5 Dec. 2008, KwaZulu-Natal South Africa, http://scienceincommunity.wordpress.com/
The document discusses various approaches to measuring the value and impact of public engagement activities. It presents examples of evidence that could demonstrate engagement's influence, such as changes in policy, practice or communities. Methods are described, like outcome mapping, case studies and social network analysis, that can evaluate engagement's role in the policy process. The importance of learning during and after projects is emphasized.
This document discusses various methods for keeping up-to-date in humanities research, including current awareness services, discussion lists, blogs, and collaborative tools. It identifies email and RSS alerts, journal tables of contents, database search alerts, and Google Alerts as ways to receive notifications about new information. Discussion lists like H-Net and JISCMail are recommended for participating in conversations, while blogs can be used to disseminate research and build networks. Mendeley allows collaboration through features like reference management, PDF annotation, groups, and networking.
The good, the efficient and the open: changing research workflows and the nee...hierohiero
This document discusses changing research workflows and the need to transition from open access to open science. It presents several models of the research workflow as multi-cyclic and multi-ordered, with loops for activities like grant writing, experimentation, and publishing. The document outlines three goals for science: being good, efficient, and open. It then analyzes survey data on tools researchers use at different stages of the workflow and how they align with open science principles. The findings suggest a shift towards more open tools and formats.
Updated for January 2015.
Versions of this presentation have been given at:
- Ex Libris Alma and Primo 'Solutions Day' at the Kungl. Myntkabinettet (Royal Coin Cabinet) museum, Stockholm, Tuesday 25th November 2014.
Different Media for communicating Science to different groupswellcome.trust
Presented by Derek Fish (Unizul Science Centre, South Africa) at the Public Engagement Workshop, 2-5 Dec. 2008, KwaZulu-Natal South Africa, http://scienceincommunity.wordpress.com/
This document summarizes a study of varved clays in pits around Little Ferry, New Jersey. The clays represent seasonal deposits laid down in a lake as the last ice sheet retreated northward after the Wisconsin glaciation. Each varve consists of a lighter summer layer and darker winter layer. Over 2,550 consecutive varves were identified, providing a record of seasonal deposition over thousands of years. The varves were correlated between pits based on distinctive marker bands. This extended the known post-glacial time record for retreating ice sheets in eastern North America to over 13,000 years.
This document provides a summary of the health of the Casperkill watershed in Dutchess County, New York. It finds that the health of the Casperkill has declined over time due to human impacts on the landscape including deforestation, development, dumping, stormwater runoff from parking lots, and degraded water quality. While the stream once supported a diversity of plant and animal life, many species have been lost or replaced. The document concludes that full recovery of the Casperkill is unlikely but efforts should be made to protect remaining natural areas to prevent further degradation.
Glacial Lake Missoula formed during the last ice age when an ice dam blocked the Clark Fork River in what is now northern Idaho. This immense lake held as much water as Lakes Erie and Ontario combined. There is debate around how many times the ice dam catastrophically failed, releasing massive floods across the region. Studying the bottom sediments of the lake can provide clues about its history. A large exposure of well-preserved Lake Missoula bottom sediments near Missoula, Montana shows characteristics like varves that can help unravel the number and timing of floods from the lake.
Scotland has a unique culture and traditions that set it apart from England despite being part of Britain. Some key aspects of Scottish culture and tradition discussed in the document include the Scottish national dress of kilts, the national drink of Scotch whisky, tartan patterns associated with Scottish clans, bagpipe music, Highland games, and the legendary Loch Ness Monster. Scotland's history, scenery, customs, and people contribute to its distinct identity and atmosphere within the United Kingdom.
This document provides an overview of annotating sources and sample annotations for different source types such as books, journal articles, magazines, and web sources. It discusses requirements for drafting annotations, including writing complete citations in bibliography form and annotations that are 3-7 sentences long. The document also covers dictionaries and their history, the Oxford English Dictionary, making dictionaries, dictionary types, and due dates for upcoming assignments.
The document discusses concepts related to geology and dating methods. It defines relative and absolute dating, and describes radiometric dating which uses the decay of radioactive isotopes to determine the age of rocks. Radiometric dating equations are provided, and used to calculate that the oldest rocks on Earth are approximately 3.96 billion years old, suggesting the age of the Earth is around 4.6 billion years.
Exercise at NoWAL Open Research workshop 13 June 2019, led by Lancaster University Library. Blog post about the event available at https://wp.me/p81NIC-f9
Updated 30/01/2015
This session included discussions around the value of bibliometrics for individual performance management/promotion and the REF.
What are bibliometrics?
Journal metrics
Personal metrics
Article level metrics and altmetrics
The document announces the Donald H. Wulff Diversity Travel Fellowships Program which provides up to $1,200 grants to support travel to the annual POD conference for individuals from underrepresented groups. Eligible applicants include those from racial/ethnic minority groups, underrepresented institutions, or who can contribute to POD's mission of social justice and equity. The deadline to apply for the 2013 conference is May 24th and applications should address the applicant's eligibility and how they and POD would benefit from their attendance. Recipients will be expected to share what they learn at a conference session and participate in assessments of the program.
Loyola University Chicago migrated from their Voyager system to Alma and Primo after 15 years. They had an aggressive 6 month timeline for the project. Key aspects of the project included data migration, system configuration, training staff, and testing the new systems. Functional workshops and weekly calls helped address workflows. The migration involved consolidating data and locations before transfer. Implementation of resource management, acquisitions, and the Primo discovery layer required continual set up and customization.
The document discusses Lancaster University's transition from its legacy library system to a new unified library services platform called Alma. Key points include:
- Lancaster signed a contract with Ex Libris in 2011 to implement Alma to improve efficiency, enhance services, and position the library for the digital environment.
- The implementation involved migrating data from previous systems, configuring Alma's functionality, integrating with other campus systems, and optimizing workflows.
- Initial challenges included slow performance and incomplete integrations, but the library has now established basic workflows and sees potential for future improvements through analytics and community collaborations.
- Moving to a cloud-based system with Ex Libris provides benefits like reduced infrastructure costs and
Newcastle University Library - Pop-up LibraryCILIP PPRG
Newcastle University opened a pop-up library to address space issues in their main libraries during exam periods. They analyzed goals, developed concepts, and designed and marketed the pop-up library. It was successful in providing additional space and received positive feedback from students who appreciated having more room to study. The pop-up library achieved the university's objectives and evidenced the need for more permanent library space.
The first step to successfully handling negativity on the Internet is to identify where it's coming from. National Research Center (NRC) describes the four most common sources of Web negativity faced by local governments and shares a few tips on dealing with it.
Sign up for an upcoming Webinar on this topic at www.n-r-c.com/webinars.
Science & Community Public Engagement Workshopwellcome.trust
Presented by Clare Matterson (Director of Medicine, Society and History (MSH) at the Wellcome Trust) at the Public Engagement Workshop, 2-5 Dec. 2008, KwaZulu-Natal South Africa, http://scienceincommunity.wordpress.com/
The document discusses various approaches to measuring the value and impact of public engagement activities. It presents examples of evidence that could demonstrate engagement's influence, such as changes in policy, practice or communities. Methods are described, like outcome mapping, case studies and social network analysis, that can evaluate engagement's role in the policy process. The importance of learning during and after projects is emphasized.
This document discusses various methods for keeping up-to-date in humanities research, including current awareness services, discussion lists, blogs, and collaborative tools. It identifies email and RSS alerts, journal tables of contents, database search alerts, and Google Alerts as ways to receive notifications about new information. Discussion lists like H-Net and JISCMail are recommended for participating in conversations, while blogs can be used to disseminate research and build networks. Mendeley allows collaboration through features like reference management, PDF annotation, groups, and networking.
The good, the efficient and the open: changing research workflows and the nee...hierohiero
This document discusses changing research workflows and the need to transition from open access to open science. It presents several models of the research workflow as multi-cyclic and multi-ordered, with loops for activities like grant writing, experimentation, and publishing. The document outlines three goals for science: being good, efficient, and open. It then analyzes survey data on tools researchers use at different stages of the workflow and how they align with open science principles. The findings suggest a shift towards more open tools and formats.
Updated for January 2015.
Versions of this presentation have been given at:
- Ex Libris Alma and Primo 'Solutions Day' at the Kungl. Myntkabinettet (Royal Coin Cabinet) museum, Stockholm, Tuesday 25th November 2014.
Different Media for communicating Science to different groupswellcome.trust
Presented by Derek Fish (Unizul Science Centre, South Africa) at the Public Engagement Workshop, 2-5 Dec. 2008, KwaZulu-Natal South Africa, http://scienceincommunity.wordpress.com/
This document summarizes a study of varved clays in pits around Little Ferry, New Jersey. The clays represent seasonal deposits laid down in a lake as the last ice sheet retreated northward after the Wisconsin glaciation. Each varve consists of a lighter summer layer and darker winter layer. Over 2,550 consecutive varves were identified, providing a record of seasonal deposition over thousands of years. The varves were correlated between pits based on distinctive marker bands. This extended the known post-glacial time record for retreating ice sheets in eastern North America to over 13,000 years.
This document provides a summary of the health of the Casperkill watershed in Dutchess County, New York. It finds that the health of the Casperkill has declined over time due to human impacts on the landscape including deforestation, development, dumping, stormwater runoff from parking lots, and degraded water quality. While the stream once supported a diversity of plant and animal life, many species have been lost or replaced. The document concludes that full recovery of the Casperkill is unlikely but efforts should be made to protect remaining natural areas to prevent further degradation.
Glacial Lake Missoula formed during the last ice age when an ice dam blocked the Clark Fork River in what is now northern Idaho. This immense lake held as much water as Lakes Erie and Ontario combined. There is debate around how many times the ice dam catastrophically failed, releasing massive floods across the region. Studying the bottom sediments of the lake can provide clues about its history. A large exposure of well-preserved Lake Missoula bottom sediments near Missoula, Montana shows characteristics like varves that can help unravel the number and timing of floods from the lake.
Scotland has a unique culture and traditions that set it apart from England despite being part of Britain. Some key aspects of Scottish culture and tradition discussed in the document include the Scottish national dress of kilts, the national drink of Scotch whisky, tartan patterns associated with Scottish clans, bagpipe music, Highland games, and the legendary Loch Ness Monster. Scotland's history, scenery, customs, and people contribute to its distinct identity and atmosphere within the United Kingdom.
This document provides an overview of annotating sources and sample annotations for different source types such as books, journal articles, magazines, and web sources. It discusses requirements for drafting annotations, including writing complete citations in bibliography form and annotations that are 3-7 sentences long. The document also covers dictionaries and their history, the Oxford English Dictionary, making dictionaries, dictionary types, and due dates for upcoming assignments.
The document discusses concepts related to geology and dating methods. It defines relative and absolute dating, and describes radiometric dating which uses the decay of radioactive isotopes to determine the age of rocks. Radiometric dating equations are provided, and used to calculate that the oldest rocks on Earth are approximately 3.96 billion years old, suggesting the age of the Earth is around 4.6 billion years.
Similar to Mining and mapping places with multiple names (6)
Exercise at NoWAL Open Research workshop 13 June 2019, led by Lancaster University Library. Blog post about the event available at https://wp.me/p81NIC-f9
Presentation given by Louise Tripp, Joshua Sendall and Hardy Schwamm at NoWAL Exchange of Experience 13 June 2019. Blog post on event available at https://wp.me/p81NIC-f9
This document discusses the evolution of the author's research approach over 25+ years from more observational studies done "in the field" to long-term collaborations where "the field" sets the research topics and sometimes conducts the research. It describes moving from collecting primary data to helping advocacy groups and conducting service evaluations. The author emphasizes establishing relationships over time, breaking down hierarchies between research and practice, and ensuring research is actually useful to people in the field.
Jessica Phoenix undertook PhD research in collaboration with the local police force, measuring and investigating domestic violent crime. She recently completed a project with the force on missing person cases from 2015, which contained thousands of cases with names, addresses, and personal details. To anonymize the data for analysis outside of police systems while still tracking repeat missing persons, she applied a coding method. It took much longer than expected to remove duplicated, erroneous, and identifying data, ultimately leaving 4746 usable cases. While access to the data and input from both researchers and practitioners was beneficial, coordinating schedules and underestimating data cleaning time created challenges for the research progress.
The document describes the process of analyzing NHS administrative data on patient appointments to categorize patients based on their appointment attendance and missed appointments. It involved processing over 800,000 appointments for 73,000 patients to compute attendance rates and classify patients into categories of zero, low, medium, or high missed appointments. Demographic data on patients was then merged to analyze patterns between attendance and factors like age, distance from practice, and socioeconomic status.
This document discusses making the most of research data by organizing and sharing it. It provides an example of a dataset on sensory modality norms that was collected, published in a journal article in 2009, and has since been widely used and extended by other researchers. The document recommends storing data on a permanent repository with a citation, using an open license, including documentation, and sharing analysis scripts to allow the data to be used by others. Sharing data in this way can result in the data being applied in new studies, simulations, and materials or used as evidence to support examples.
The document discusses a dataset used to study schizophrenia genetics. It notes that past studies of candidate genes did not provide clear insights into the genetic basis of schizophrenia. It then describes the Psychiatric Genomics Consortium (PGC), which began in 2007 with over 800 investigators from 38 countries studying over 900,000 individuals. The PGC aims to conduct inclusive, open analyses of genomic data to better understand psychiatric disorders like schizophrenia through large-scale collaboration.
This document discusses creative approaches used to help children articulate their experiences during and after flooding events, including walking tours, photography, model making, and theatre. It provides examples of conversations researchers had with children about flood warnings, difficulties after being displaced, depicting people's emotions before and after flooding through models, and feeling more comfortable sharing their stories with others who had similar experiences. The document directs readers to a film that further explores how children recover and build resilience after flooding.
This document discusses using smartphone data to gain psychologically important insights. It summarizes past research that achieved 85% accuracy in predicting bipolar symptoms and 92% accuracy in detecting deception using location and usage data. The document then describes two of the author's own apps, ParkinsonEaston and Getting Log, which analyze location, movement patterns, and usage logs. It notes that recent changes in Android software now limit background access to location data, but discusses ways researchers can still gain insights while respecting users' privacy, such as focusing on places visited rather than movements.
This document discusses software as a research object and the importance of research software. Some key points:
- Many researchers rely on software for their work but few have formal software training. Software is integral to modern research.
- Studies have found low reproducibility in scientific publications due to issues with unavailable software and code. Proper documentation and sharing of research software is needed.
- The Software Sustainability Institute aims to cultivate better, more sustainable research software to enable world-class research. They provide training, community support, and advocate for improved software practices and policies.
- Culture change is needed to incentivize sharing of research software and code. Mechanisms are emerging to properly credit software contributions and cite
The document discusses the author's past, present, and future approach to releasing software code accompanying research publications. In the past, the author published a paper on sentiment analysis without adequately documenting or testing the accompanying code. Currently, the author aims to improve by adding unit tests, documentation, and code repositories to validate results and ease accessibility. Going forward, the author hopes more researchers will release code to allow others to build upon their work more quickly and advocates for support like research software engineers to help with this process. Releasing code, even imperfect code, allows others to help improve it and leads to greater research impacts.
This document discusses using GitLab for revision control and managing code. It notes some common barriers to using Git like not wanting to share code or feeling code is not ready to share. It then compares GitLab to GitHub and discusses how to set up a GitLab instance on campus for private code repositories with total privacy and control. Installation of GitLab was described as easy and it provides benefits like backups of work and adding collaborators.
This document discusses energy transitions. It was written by Dénes Csala, a lecturer in energy storage systems dynamics. The document contains random words and does not provide any clear information that can be summarized in 3 sentences or less.
Lancaster University's IT Security Manager outlines key aspects of the university's security overview in May 2017. It discusses external requirements like Cyber Essentials Plus certification, meeting the standards of the Information Governance Toolkit, and working towards ISO 27001. The document also covers how the university classifies information, including personal and sensitive personal data. Guidelines are provided around securely transferring, storing, and disposing of information to protect data.
This document discusses cloud computing and the challenges of complete data deletion. It notes that cloud storage allows for infinite storage and rapid resource provisioning, but deleted data may still remain accessible due to technical challenges. Specifically, it outlines that data is often stored in multiple copies and locations, making full deletion difficult. The document argues that complete and verifiable deletion is important for sensitive data and privacy rights, but remains an unsolved problem due to the complex and distributed nature of cloud infrastructure.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
1. Mining and mapping places with multiple names
James Butler & Christopher Donaldson
Lancaster University
2. 1901
Corpus of Lake District
Literature
1688 1789 1837
• 80 texts, comprising more than
1,500,000 words
• Mixture of canonical and non-
canonical literature about the Lake
District, mainly from c18 and c19
(78 out of 80 works)
• Mixture of genres, including
guidebooks, travelogues, novels,
poems, journals, and private letters
34 Texts
650K words
22 Texts
250K words
22 Texts
613K words
3. Sample sentence collocation: beautiful
‘Again entering the boat, we passed up the channel between Lord’s
Island the shore, from whence beautiful prospects are obtained of the
majestic form of Skiddaw, with the woods of Castlehead and
Cockshot Park in the foreground.’ (Edward Baines, A Companion to the
Lakes [1829] 121.)
±5 tokens: No place-names identified
±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw
Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead &
Cockshot Park.
Average sentence length
Lake District corpus = 29.8 words
British National Corpus (BNC) = 16 words
4. from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized
Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89.
Diagram of the Edinburgh Geoparser System
8. Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess
‘headland’
*Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness
cf. D. Whaley, A Dictionary of Lake District Place Names
(Nottingham: English Place-Name Society, 2006), 42.
9. Some of the common generic gazetteer geo-referenced issues…
Spatial misattribution.
Onomastic misassumption
Incorrect weighting
Just for the items that are found!
10. An extract of our custom manually-collected gazetteer for the corpus
Unique
ID
Topog.
Cat.
Primary Name Secondary Names Regional
Placement
CONISTON (lake):
Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone
Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston,
Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
12. An extract from the latest iteration of the corpus - allowing referential
relationships to be analysed on a whole new level.
Lake, Vale, Specific - Farm, Waterfall
Editor's Notes
Overview of corpus…
Our interest in finding what attributes are given to places mentioned…
The Edinburgh Geoparser: NLP tool on which we’ve relied
What the Geoparser do…
The Geoparser output a bit ropey…
Much correction required..
One of the chief reasons for the poor performance of the geoparser is place-name variation…
Geospatial relationships between environmental types as well as connective strengths between any paired locations.