Benefits of correcting text
The searchable text in Papers Past is automatically generated using Optical Character Recognition (OCR) software. OCR reads the page image and translates it into a text file by recognising the shapes of the letters. Because of this, the text isn’t always correct. Issues with poor-quality paper, small print, mixed fonts, multiple-column layouts, or damaged pages may cause poor OCR accuracy.
Text corrections involve adjusting OCR text to match the original image text. This process ensures accuracy by fixing errors like misread characters. It’s essential for creating reliable digital transcripts, especially in historical documents, making them more accessible and searchable.
Correcting OCR text to match images means more successful searching for everyone.
A further benefit of correcting text is that your edits improve the downloadable text transcript.
Check the latest text corrections
How to correct text
If you notice that the text transcript doesn't match the item's image, you can fix it so that the transcript matches the image text. Text correction is available in the Newspapers section of Papers Past.
You need a Papers Past account to edit text
Anyone can participate in text corrections as long as they have a free Papers Past account and are logged in.
Text correction interface
You need to be logged in to access the text correction interface.
In Papers Past you will find the text correction interface in the tab ‘View correctable text’. Select the ‘Correct this text’ button.
The text correction interface is split into two parts: the left side shows the page images that make up the document, and the right side is used for editing the lines of text.
Make text corrections
- Select the ‘Correct this text’ button above the article.
- The text correction interface will appear in a pop-up window.
- Correct the OCR text to match the text in the image article line by line. You can use the Tab key to go to the next line, or Shift-Tab to go back up a line.
- If you have corrected all the errors in an article, tick the box ‘The text in this article is completely correct’ at the bottom of the pop-up window.
- Once you are finished making your corrections, click the ‘Save changes’ button.
Save, save, save
You need to save your changes as you are working. Either:
- save your changes as you go by clicking on the ‘Save changes’ button at the top left, or
- exit without saving your corrections by clicking the ‘Exit’ button at the top right or hitting the Esc key.
Text correction editing guidelines
You’ve got the power and we ask that you use it wisely. Follow our guidelines so that your corrections add to better search results and a richer experience for all users.
A rule of thumb: type what you see
Transcribe what you see, following the order and layout of the original document as best you can.
Include only the text in the image in your edits. We are investigating options to allow you to add more information or thoughts about articles but this is not currently available.
Don’t delete text
Don’t delete text from the transcript if it appears in the article image.
Incorrect edits
Text corrections can be seen by anyone on the Papers Past website. If you find corrections unrelated to the original article feel free to log in and correct them to match the original article. If the incorrect corrections appear ongoing and intentional contact us.
We don’t modify the original source data produced with the OCR process, so it’s always possible to roll back to the original text if necessary.
Check the latest text corrections
Misspellings in the original printed page
Your transcription should preserve the spelling, grammar and word order of the original document. We know that spelling, place names and personal names change over time and are frequently spelt differently in older newspapers than they are today. For example, ‘connexion’ in the example below is not misspelt but is an older spelling of ‘connection’.
If you see words, place names or personal names you think are misspelt or know the spelling has changed, type the word as printed and follow this with the updated spelling in square brackets.
Hyphenated words
If a word is hyphenated because it is split across two lines, type as it appears in the image, for example ‘hyphen-’ at the end of the first line, and "ated" at the start of the second line. Hyphens that appear elsewhere in the text should also match the image.
Blank spaces and miscellaneous punctuation and symbols
Don’t worry about correcting blank spaces, miscellaneous punctuation and symbols they do not affect searching. If you want to tidy them up for appearance's sake that is fine.
‘The text in this article is completely correct’ box
Using ‘The text in this article is completely correct’ box shows that one user believes the block is as correct as it is possible to make it. In future, we may use the checkbox to make it easier to locate blocks that have not been checked.
Illegible text
Occasionally it may not be possible for the OCR software to read a line or several words, for example where the original document is faded or damaged. If you are unable to make out the original word use square brackets to indicate [illegible] text. Another user might come along and be able to read it.
A block should still be marked as "completely correct" even if it contains some text marked as [illegible]. Adding [illegible] means we can search for blocks containing illegible text, if it's ever necessary or useful to locate them.
Images and Illustrations
Images and illustrations are commonly picked up by the OCR software as blocks with no text. As a general guide to handle these image blocks, correctors can use [image] to identify this block as an image. In most cases, there will be a caption block below the image to describe the image content. In case there isn't one, the suggestion is to add the description within [image]. See examples below:
- [image: Photo - Senator James P. Davensofa]
- [image: Map - Northern Eriesoil]
- [image: Drawing - residential (1st flight) plan(e)]
The block can then be marked as "completely correct".
Missing and cut-off lines
Occasionally, a line needing correcting will be skipped. If you come across this problem, you can still make corrections. Simply add the missing line of text to the end of the line above. If there is no preceding line, add the text to the start of the following line. Where possible make sure that the start of each line matches the start of the original line of text.
What if a ‘line’ of text crosses over two or more columns?
As with missing and cut-off lines, do your best to transcribe the text in the correct reading order.
Punctuation
Punctuation and capitalisation should reflect what is published.