JSON copyright information records

Copyright information for serials in The Online Books Page and Deep Backfile Project are recorded in JSON files that are linked to The Online Books Page and to Wikidata. They are also available for download in the Online Books Page's Github repository. This page documents the fields (name-value pairs) and conventions used for those files.

Sections: File identifiers and top-level organization -- Serial object fields -- Date values -- Issue object fields -- Contribution object fields -- Agent object fields -- Annotated link object fields -- Problems, questions, contacts

File identifiers and top-level organization

Each JSON file has a unique identifier consisting of a string of lowercase ASCII letters (a-z) and digits (0-9). We try not to have this identifier be more than 15 characters long, but some older records have longer identifiers. The identifier is usually a mnemonic suggesting the title of the serial it describes, in a way that aids in its identification and alphabetization.

The name of the underlying file found in our Github directory is the identifier plus the string .json. The identifier is also used as the value for the Online Books Page publication ID (P5396) property of the corresponding serials in Wikidata. The URL for human-readable copyright information is https://onlinebooks.library.upenn.edu/webbin/cinfo/ followed by the identifier. The URL for the machine-readable JSON file is that URL followed by ?format=json. The Online Books Page record linking to free online content, if one exists, is https://onlinebooks.library.upenn.edu/webbin/serial?id= followed by the identifier.

For example, the identifier we use for our copyright record for The New York Times is nytimes. The copyright record can be read on the web at https://onlinebooks.library.upenn.edu/webbin/cinfo/nytimes, and the JSON file can be retrieved at https://onlinebooks.library.upenn.edu/webbin/cinfo/nytimes?format=json. There is also a copy of the JSON file in our Github repository at https://github.com/JohnMarkOckerbloom/onlinebooks/blob/master/data/cinfo/nytimes.json. The Online Books Page record linking to free content can be found at https://onlinebooks.library.upenn.edu/webbin/serial?id=nytimes. The Wikidata record for The New York Times includes an Online Books Page publication ID with the value nytimes, which in the Wikidata display links to our copyright information page.

The contents of the JSON file are Unicode characters in the UTF-8 encoding. No additional encodings (such as the character entities used in HTML and XML) are applied to the contents, though the usual JSON escape conventions can be used for string values. (In practice, we try to include literal characters where possible, and avoid backslashes and double quotes in field values, to minimize the possibility of ambiguity.)

The JSON file consists of a serial object, representing copyright and related information about the serial. Below are the fields that may appear in it, in the order they usually appear in. (Of course, order of name-value pairs in a JSON object does not carry any semantic meaning, but it's useful for humans reading and editing the files to have the fields in an expected order, formatted with suitable whitespace.)

Serial object fields

title

The value is a string giving the title of the serial. This field generally appears in all of our copyright records.

By convention, the titles given in Penn's copyright knowledge base are given in Roman characters, using title case.

Titles of serials published between 1930 and 1963 are also represented on our first renewals page. Currently this is done by manual inclusion in a template; eventually this will be done automatically.

title-note

The value is a string clarifying the nature of the serial. This field is usually only used when needed to distinguish the serial from other works with the same or similar titles.

aka

The value is an array of strings with other titles the serial is known by (including other titles used in copyright renewal records, even if not otherwise used much). If there are no such titles, this field is omitted.

The titles here follow the same conventions as mentioned in the title field documentation above. They are also listed in the first renewals page as cross-references to the main title. (Again, this is currently done by manual template inclusion, but will eventually be done automatically.)

online

The value is a string. It is typically "1" if present, indicating that free issues of this serial are online, and are linked on The Online Books Page at the URL https://onlinebooks.library.upenn.edu/webbin/serial?id= followed by the identifier for this serial.

If the value is "0" or an empty string, no indication of free online issues is made. We typically just omit this field in that case, however.

A deprecated, but still sometimes encountered, use of this field is to place a URL in it. Any value that does not evaluate to false in Perl's Boolean context will indicate that free issues of this serial are online, and a URL in this field will indicate a link to free issues at the supplied URL. However, we are endeavoring to make Online Books Page records for all of these serials eventually, making the URL use of this field superfluous.

contents

The value is an array of annotated link objects (using the "url" field) indicating places online where tables of contents for the serial can be found. This field is omitted if no such places are known.

By convention, the notes on the links will indicate if full text is available at those links, and whether and when access to the full text requires subscription or payment. Sites that provide free access to everything listed in their contents are typically not listed here, but instead are linked from their Online Books Page entry.

website

If used, the value is an annotated link object (using the "url" field) indicating the official web site for the serial.

Use of this field is deprecated, as such values should be in the serial's Wikidata record and/or Online Books Page entry, and we aim to have at least one of those in existence. Inclusion here would thus be redundant, but a few of our JSON files still have it.

preceded-by

The value is an array of annotated link objects (using the "id" field) indicating the serials that precede this one. Only direct predecessors should be noted, so this array will typically only contain one object unless the serial was created by a merger of multiple prior serials.

If this field is used, we recommend also using the first-issue field, to make it clear when this serial picks up from its predecessor.

first-issue

The value is an issue object indicating the earliest issue of the serial that was published.

rights-statement

When used, the value of this field indicates the copyright status of all of the serial's content. Two values are currently recognized:

If this field is used, the fields described below related to renewals are not used, and vice versa.

first-renewed-issue

The value is either the string none, or an issue object indicating the earliest issue of the serial that has an active, filed renewal.

If the value is none, this indicates that no renewals were filed for this serial that are still active (that is, they are not expired). There may still be issues after 1963 that have automatically renewed copyrights, however.

This field should generally not have a renewal that has expired due to age. Therefore, in 2026, when all remaining 1930 copyrights expire, serials with a 1930 first-renewed-issue value will need to be updated to the first renewed issue with a copyright after 1930, if any.

first-renewed-issue-source

The value is a string indicating where the first renewal indicated in first-renewed-issue can be found (or is not found) in official copyright registration sources. The following values are recognized:

first-autorenewed-issue

When used, the value is an issue object indicating the earliest issue of the serial that had an automatic renewal. This is generally the earliest issue with a copyright date of 1964 or later.

first-renewed-contribution

The value is either the string none, or a contribution object indicating the earliest contribution to the serial that has an active, filed renewal.

If the value is none, this indicates that no contribution renewals were filed for this serial that are still active (that is, they are not expired). There may still be contributions published after 1963 that have automatically renewed copyrights, however.

This field should generally not have a renewal that has expired due to age. Therefore, in 2026, when all remaining 1930 copyrights expire, serials with a 1930 first-renewed-contribution value will need to be updated to the first renewed contribution with a copyright after 1930, if any.

Most contribution objects used as values for first-renewed-contribution only have the issue field set, and don't fill in the other fields.

Some JSON files may instead have an issue object here, indicating the issue in which the first contribution appeared. This is deprecated, however.

first-renewed-contribution-source

The value is a string indicating where the first renewal indicated in first-renewed-contribution can be found (or is not found) in official copyright registration sources. The recognized values are the same ones that are recognized for first-renewed-issue-source.

last-issue

When used, the value is an issue object indicating the last issue of the serial that was published. This field should only be used when a serial has ceased publication, or has been replaced by a serial mentioned in the succeeded-by field.

succeeded-by

The value is an array of annotated link objects (using the "id" field) indicating the serials that succeed this one. Only direct successors should be noted, so this array will typically only contain one object unless the serial was split into multiple successors.

If this field is used, we recommend also using the last-issue field, to make it clear when this serial leaves off before its successors pick up .

see-also

The value is an array of annotated link objects (using the "id" field) indicating serials that are related to this one that a copyright researcher may need to know about, but that are not recorded in the preceded-by or succeeded-by fields.

Serials recorded in this field might include supplements (or the parent publications of supplementary serials), special sections with their own copyright registrations, translations and other derivatives, indexes, or serials with similar titles that might be confused with this one. Typically serials linked in see-also fields also link back to this serial in some way.

renewed-issue-completeness

The value is a string indicating how many of the renewals for issues of this serial are represented in this record. The following values are currently recognized:

If this field is not defined, or is empty, there is no claim that all renewals in a particular range are listed, and the renewals that are listed should generally be assumed to only be a partial list.

renewed-issues

The value is an array of issue objects, each of which specifies an issue whose copyright was renewed.

By convention, and for convenience, the renewed issues are listed in chronological order.

If the renewed-issue-completeness field is used, the array contains at minimum all of the issues renewed in the date range indicated by the value of that field.

By convention, the JSON files at Penn usually do not list renewals for copyrights that have expired. (So renewals filed for copyrights prior to 1930 are generally not listed.)

renewed-contribution-completeness

The value is a string indicating how many of the renewals for contributions to this serial are represented in this record. The same values recognized for the renewed-issue-completeness field are recognized here, and the usage conventions and semantics are the same as for that field.

renewed-contributions

The value is an array of contribution objects, each of which specifies a contribution to this serial whose copyright was renewed.

By convention, and for convenience, the renewed contributions are listed in chronological order of the issues in which they appear. If the ordering of contributions in a given issue is known, contributions to the same issue should be listed in the order they appear in the issue, but since that order is often not known, one should not assume that this level of ordering is present in the array.

If the renewed-contribution-completeness field is used, the array contains at minimum all of the contributions renewed in the date range indicated by the value of that field.

By convention, the JSON files at Penn usually do not list renewals for copyrights that have expired. (So renewals filed for copyrights prior to 1930 are generally not listed.)

additional-note

The value is a string, with additional information about this serial or its copyrights that might be important for copyright researchers to know.

If more than one type of additional information is warranted, the additional-notes field should be used instead of this one.

An additional-note is called for if the serial is published outside the United States, and might be exempt from renewal requirements. (Information indicating that it might be subject to such requirements is also noted if applicable.) This type of information may eventually be its own structured field.

An additional-note is also called for if the serial is known to have been published in the US without copyright notices prior to 1989.

additional-notes

The value is an array of strings, each with additional information about this serial or its copyrights that might be important for copyright researchers.

Each string is shown as a separate paragraph.

If there is only one additional note, the additional-note field should be used instead of this one.

responsibility

The value is an agent object indicating who is responsible for the information in this record. The object should contain enough information to allow the agent to be contacted.

acknowledgement

The value is a string thanking people who supplied information for this object and are not mentioned in the responsibility field.

The script generating JSON files at Penn from Deep Backfile form submissions automatically fills in a value here thanking the person whose name is on the submission. This value may need to be manually updated, however, if other people should also be thanked.

last-updated

The value is a date value string indicating the date of the last change to this object.

comment

This field can occur anywhere, and its value is ignored by machines and not generally shown in web pages generated from the JSON file. Since JSON has no comment mechanism of its own, though, this can be a useful place to put comments meant to be seen by others editing with or working on the JSON file.

Date values

A date value is a string indicating a date. If possible, this date should be parseable by machines, so the ISO 8601 format should be used for a date that is a specific year, month, or day of the year in the Gregorian Common Era calendar. Specifically:

Further details, such as time and time zones, or EDTF date ranges, are not currently expected, except where noted in this documentation.

Issue object fields

issue-date

The value is a date value string indicating the stated date for the issue.

cdate

The value is a date value string indicating the date the copyright was secured.

Typically this is a full date value when known, or if not, the year. This field is generally not used unless the copyright year is different from the year in the issue date, or is otherwise important to specify.

volume

The value is a string (generally a number) indicating the volume, if any, that the issue belongs in.

number

The value is a string indicating the issue number (relative to the volume, if a volume is specified).

Values that are not numbers are allowed if the issue is printed with a designation that is not a number. When the value is an integer, it is generally shown with "no." in front of it. If it is not an integer, that is omitted, so any such designation should be explicitly noted; e.g. the value for an issue numbered "1-2" could be no. 1-2.

series

The value is a string indicating the series (for serials that have multiple series, where a series designation is needed to distinguish issues).

This currently is only used when needed. If the value is an integer, it is shown after the word "series", so the value 2 is shown as "series 2". Other values are shown before the word "series", so the value new is shown as "new series".

note

The value is a string, used for anything that should be displayed after the issue designation.

Contribution object fields

issue

The value is an issue object specifying the issue in which this contribution appears.

title

The value is a string giving the title of the contribution.

title-note

The value is a string clarifying or qualifying the title of the contribution. This might be used, for instance, to distinguish it from another contribution with a similar title, or to note how the contribution relates to the title (e.g. "part 2 of 5" in the case of an installment of a multi-part serialization).

author

The value is an agent object specifying the author of the contribution. If there is more than one author, the authors field is used instead.

authors

The value is an array of agent objects specifying the authors of the contribution, in the order they are credited if known. If there is only one author, the author field is used instead.

editor

The value is an agent object specifying the editor of the contribution. (This is not often used, but is appropriate for some contributions. The editor should be someone who is specifically credited for editing this contribution, not just the issue in which it appears.) If there is more than one such editor, the editors field is used instead.

editors

The value is an array of agent objects specifying the editors of the contribution. (See the note above about when editors should be noted.) If there is only one editor, the editor field is used instead.

illustrator

The value is an agent object specifying the illustrator of the contribution. This includes any sort of artist with a copyright claim to visual art in the contribution, including photographers and sculptors as well as artists who draw or paint.

We have not yet needed to credit more than one illustrator at Penn, but will create a illustrators field when we do, or upon request.

translator

The value is an agent object specifying the translator of the contribution. If there is more than one translator, the translators field is used instead.

translators

The value is an array of agent objects specifying the translators of the contribution, in the order they are credited if known. If there is only one translator, the translator field is used instead.

note

The value is a string, with additional information about this contribution or its copyright that might be important for copyright researchers to know.

Agent object fields

authorized

The value is a string with the authorized heading for the agent, if one exists.

Currently the only recognized authority is Library of Congress Name authorities, so if this field is used, the lcna field is used as well.

If this field does not apply, the name field should be used.

The field is usually displayed in de-inverted form. That is, the part after the first comma (but before the next comma or parenthesis) is displayed, followed by the part before the first comma or parenthesis. That would, for instance, render "Mark Ockerbloom, John, 1966-" as "John Mark Ockerbloom". If that is not a correct way to display the name, the name field should also be used, and give a more appropriate display form.

name

The value is a string with the name of the agent, as it would commonly be written. Inverted forms (like "Holmes, Sherlock") are not used; instead something like "Sherlock Holmes" would be used in that case.

If both the authorized and the name field are set, the name field value is used to display the name.

using

The value is a string with the name the agent uses in the context this object appears in. (For example, if an author used a pseudonym different from their usual name in a serial contribution, that pseudonym can be noted in the using field of the agent object in that contribution's author field.) Both the standard name and the "using" name will be displayed in the record.)

lcna

The value is a string with the Library of Congress name authority identifier for this agent, if one exists. This value is used as a hook for linking to other data that also uses that identifier, including Wikidata and rightsholder contact information.

The identifier consists of lowercase letters and digits, without spaces. For example, the lcna value for John Mark Ockerbloom is no99018158.

contact

The value is a string with information on contacting the agent.

At present, recognized values are email addresses. We may support other options in the future.

Annotated link object fields

id

The value is the identifier of the serial this object should link to.

Either this field or the url field should be used in this object, but not both.

url

The value is a string giving the URL that is the destination of the link.

Either this field or the id field should be used in this object, but not both.

note

The value is a string giving the text that should be shown in the link.

This field should always be filled with a value. If this object is linking to another serial via the id field, the value of this field should be, or start with, the title of that serial. (If this is done correctly, the title will be presented in citation style.)

Problems, questions, contacts

If you have further questions about the JSON files used in this project please contact John Mark Ockerbloom at ockerblo (at) upenn.edu.