Rachael Lammey, Product Manager at CrossRef. 30 minutes to provide an update and overview of CrossCheck, our originality screening service. Questions at the end.
I want to talk about why there is demand for a service like CrossCheck in scholarly publishing. It’s certainly not a new problem, but there are suggestions that it may be getting worse. I’ve put Google up on screen to point to the fact that it has never been easier to search across vast amounts of content in online publications and databases. And with more content being produced than ever before it’s much harder for reviewers to have thoroughly read everything in their field.
So let’s start with the basics and look at what CrossCheck is. Well I think that really comes down to the database. We tend to think of CrossCheck as two things and one of those is the CrossCheck database. As you may know, when a publisher signs up they enable indexing of their content so that it can be added to this resource that both you and other publishers can check against. So more publishers = more content in that database. And of course the other part of the tool is the iThenticate software which users use to upload documents and generate the similarity reports. It’s no secret that in exchange for being able to index the content you get lower per-doc fees than you would if you used the off-the-shelf iThenticate product.
So to look at the process in a little more detail: you submit your manuscript to the iThenticate system, and it is by default checked against three databases of content. It is checked against web content - iThenticate indexes web pages in much the same way as a search engine, but with the added advantage that they keep an archive of web pages going back eight years. The manuscript is checked against the CrossCheck database, which contains the content from all of the participating CrossCheck publishers. And it’s also checked against a growing repository of online and offline content that iThenticate is gathering and indexing, including datbases from Gale and Ebsco, and sites such as PubMed and Arxiv.org.And as before, matches retrieved by comparison with these databases are pulled into a report for an editor to examine in more detail. So let’s talk about the reports and what CrossCheck is NOT.
Glance or side by side. Nb – fuzzy matching – will pick up on word substitutions.
Some additional features that you should be aware of: it’s possible to exclude certain things to help reduce background noise. You can opt to exclude anything that’s included in quotation marks. You can exclude the reference section, and you can choose not to be shown any matches below a certain number of words - so perhaps strings of fewer than 25 words would not be shown. You should be aware that the first two of these features work on fairly hard and fast rules, so there need to be opening and closing quotation marks for a quote to be spotted and excluded, and the exclude bibliography feature relies on there being a recognisable section heading for the references to identify it at the end of a document. So some documents will slip through these filters.
Limitations:photos or imagesgraphs and tablesformulaeText only Also limited by content in the database – not all CrossRef member publishers participate in CrossCheck though numbers are increasing quickly.
There are lots of ways to get involved in CrossCheck – we’ve started running CrossCheck specific User Groups: like this one! And they’re proving popular so far. US ones too around the CSE meeting in May. Regular webinars and more around new functionality, mailing list you can sign up to for updates and we’re currently running a survey on usage. If you didn’t get it, come and find me and let me know. We’re interested in your feedback.
So I’m going to leave it at that for now and let Laurie talk you through the iThenticate developments you can expect to see in the next 6 months or so. We’ve also got links to our Twitter page, an email address for more info and the Info pages.
1. CrossRef Workshop Barcelona
2. ctrl c+ ctrl v+
3. • 2006: CrossRef board raises
plagiarism as area of concern
• Late 2007/early 2008: pilot
with seven publishers and
technology partner iParadigms
• June 2008: CrossCheck
4. • Database of content to check text
• iThenticate software that analyses and
5. Additional Features
•Exclude small sources/matches
6. • 555 publishers
• Over 38.9 million content items indexed
• 101,000+ titles
• 100,000+ manuscripts being uploaded
• Title and member list on the CrossRef
7. • CrossCheck User Groups and events
• Presentations at conferences
• Mailing list
Participation and Feedback