Exporting your data

Information about the data export formats supported by Publish or Perish

You can export the publication data from Publish or Perish to the following formats:

  • Search report - a rich-text report with all search parameters, retrieval information, metrics, and all returned results.
  • BibTeX - a generally used format for bibliographic references, based on the TeX typesetting program and LaTeX macros.
  • CSV - a comma-separated values format accepted by most databases and spreadsheets.
  • EndNote - a data exchange format for use with the EndNote program from Thomson.
  • ISI Export - a data exchange format produced by Thomson's ISI/Web of Science export function.
  • JSON - a generic data format that is supported by many programs.
  • JSON Lines - a variation on JSON that is better suitable for appending output.
  • RefMan/RIS - a data exchange format used by a variety of reference managers, including Reference Manager.

The general procedure is as follows:

  1. In the results pane, check the publications that you want to export. By default, all lines are checked and exported.
  2. If desired, click on the column headers to sort the data in the desired order. The sort is stable, which means that you can sort on multiple columns by clicking them in reverse order. (For example, to sort primarily by author, secondary by year, and tertiary by publication, click on Publication, then on Year, then on Authors.)
  3. Choose File > Save As xxx from the main menu, where 'xxx' is the desired export format.

Below are the details of each format as Publish or Perish implements them. This is for reference only; in general you do not need to know these details if you are exchanging the data with other software that recognises at least one of these formats.

Note: To export your data to a Publish or Perish data archive, use the Export to Archive command.

BibTeX format details

The BibTeX format is defined in a variety of locations with varying level of detail; we have used BibTeX.org and Wikipedia.

Publish or Perish writes all BibTeX format files as plain text Unicode files, encoded as UTF-8 with the UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. If a field contains embedded '\' (U+005C) or '&' (U+0026) characters, they are escaped by prefixing them with another '\' to avoid misinterpretation as commands or control codes.

For each publication, Publish or Perish creates one of the following BibTeX entries:

BibTeX entry Used for Notes
@article Journal articles Used if the Publication field is present.
@book Book Used if the Publication field is empty, but the Publisher field is present.
@misc Various Used if none of the above applies.

For each BibTeX entry, Publish or Perish generates a key in the format popxxxx and then uses the following subset of the available BibTeX tags:

Tag Used for Notes
type Publication type Set to the type of publication, or left out if the type is unknown
author Author names Set to the Authors field of the entry. If the field contains more than one author, Publish or Perish inserts " and " between names as per BibTeX requirements.
title Title Set to the Title field of the entry.
journal Journal name Set to the Publication field of the entry if present, else omitted.
publisher Publisher Set to the Publisher field of the entry if present, else omitted.
doi Digital Object Identifier Set to the Digital Object Identifier of the paper, if known
issn Journal's ISSN Only available from some data sources
url URL Set to the (hidden) ArticleURL field of the entry if present, else omitted.
citation Citation URL Set to the (hidden) CitationURL field of the entry if present, else omitted.
year Date Set to the Year field of the entry is present, else omitted.
volume Journal volume Only available from some data sources
issue Journal issue Only available from some data sources
pages Page numbers Formatted as start-end if both are known, or as page if only one is present; only available from some data sources
abstract Abstract Set to the Abstract field of the entry if available, else omitted
note Notes Set to x cites: CitesURL if the entry has one or more citations, else omitted.
note Notes Set to Query date: date in the format YYYY-MM-DD hh:mm:ss (ISO 8601 date and time).

CSV format details

The CSV (Comma Separated Value) format is a defacto format defined by various sources; see for example this Wikipedia entry. Publish or Perish applies all required transformations: quotes around fields that contain embedded spaces or commas, quote doubling for fields with embedded quotes, etc.

To import a CSV file into Microsoft Excel, Numbers, Access, OpenOffice Calc, or similar programs, choose the following settings in the receiving program:

  • File type: Text CSV, or Comma Separated Value, depending on the receiving program
  • Character set: Unicode (UTF-8)
  • Separated by: , (i.e., comma, U+002C; uncheck any other separators)
  • Text delimiter: " (i.e., the straight double quote, U+0022)

Publish or Perish writes all CSV format files as plain text Unicode files, encoded as UTF-8 with the UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. The first line in the file lists the fields; the remaining lines contain the citation entries, one per line. Missing fields result in an empty CSV field "" (U+0022, U+0022). The following field names are used:

Tag Used for Notes
Cites Number of citations Set to the Cites field of the entry.
Authors Author names Set to the Authors field of the entry.
Title Title Set to the Title field of the entry.
Year Year of publication Set to the Year field of the entry.
Source Journal name Set to the Publication field of the entry.
Publisher Publisher Set to the Publisher field of the entry.
ArticleURL URL Set to the (hidden) ArticleURL field of the entry if present (the abstract or full text of the article)
CitesURL Notes Set to the (hidden) CitesURL field of the entry if present (the URL to the list of citing articles)
GSRank Result ranking Set to the Rank field of the entry. This is simply the order in which the data source returned the results (1=first, 2=second, etc.). Typically, earlier ranked entries indicate more relevant search results.
QueryDate Query date Formatted as the date of the query in the format YYYY-MM-DD hh:mm:ss (ISO 8601 date and time).
Type Publication type Set to the type of publication, if known
DOI Digital Object Identifier Set to the Digital Object Identifier of the paper, if known
ISSN Journal's ISSN Only available from some data sources
CitationURL URL Set to the (hidden) CitationURL field of the entry if present (the full citation of the article)
Volume Journal volume Only available from some data sources
Issue Journal issue Only available from some data sources
StartPage Start page Only available from some data sources
EndPage End page Only available from some data sources
ECC Estimated citation count Estimated citation counts are provided separately by some data sources, for example Microsoft Academic. If not available, this field is set to the same value as the Cites field.
CitesPerYear Citations/year Set to citation count divided by the age of the article; result is rounded to 2 decimal digits
CitesPerAuthor Citations/author Set to citation count divided by the number of the authors, rounded to the nearest whole number
AuthorCount Number of authors For convenience; derived from Authors field
Age Age of the paper For convenience; calculated as year_of_reference_date - year_of_publication
Abstract Publication's abstract Since version 7.17; only available from some data sources (CrossRef, Google Scholar, Microsoft Academic, PubMed, Web of Science). Not availabe for Google Scholar Profiles and Scopus.

EndNote format details

The EndNote Import data exchange format is defined by Thomson ResearchSoft for use by its EndNote program, among others; the format specification can be found in the help file of EndNote itself.

Publish or Perish writes all EndNote Import format files as plain text Unicode files, encoded as UTF-8 with the UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. There is no maximum line length; each field is written on a single line.

Publish or Perish uses the following subset of the available EndNote Import tags:

Tag Used for Notes
%0 Type of entry Set to the type of publication if known, else to Journal Article if the Publication field is present, else to Book.
%A Author names Publish or Perish generates one %A line per author in the Authors field.
%T Title Set to the Title field of the entry.
%B Book title Set to the Publication field of the entry if present, else omitted.
%I Publisher Set to the Publisher field of the entry if present, else omitted.
%R Digital Object Identifier Set to the Digital Object Identifier of the paper, if known
%@ Journal's ISSN Only available from some data sources
%U URL Set to the (hidden) ArticleURL field of the entry if present, else omitted.
%D Date Set to the Year field of the entry if present, else omitted.
%V Journal volume Only available from some data sources
%N Journal issue Only available from some data sources
%P Page numbers Formatted as start-end if both are known, or as page if only one is present; only available from some data sources
%X Abstract Set to the Abstract field of the entry if available, else omitted
%1 Miscellaneous Set to x cites: CitationsURL if the entry has one or more citations, else omitted.
%1 Miscellaneous Set to Query date: date in the format YYYY-MM-DD hh:mm:ss (ISO 8601 date and time).

ISI/Web of Science format details

The ISI/Web of Science data exchange format is used by Thomson's Web of Science export function. We have not been able to find an official definition, but have used information from the Ruby Forge web site instead.

Publish or Perish writes all ISI/Web of Science format files as plain text Unicode files, encoded as UTF-8 with the UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. The maximum line length for most fields is 77 characters; if the field plus the preceding tag is longer than that, indented continuation lines are used except for the AU tag (which always uses one tagged line for the first author and indented continuation lines for any further authors, without length restrictions) and the UR tag (which is allowed to exceed the maximum line length if necessary).

Publish or Perish uses the following subset of the available ISI/Web of Science tags:

Tag Used for Notes
FN Format name Set to ISI Export Format (only once, at the start of the file)
VR Version Set to 1.0 (only once, at the start of the file)
PT Type of entry Set to the first letter of the publication type if known, else to J if the Publication field is present, else to B.
AU Author names Publish or Perish generates an AU line for the first author and indented continuation lines for any further authors in the Authors field. It also reformats each author's name as LastName, Initials as required by the ISI/WoS usage. Because authors lists are not always 100% accurate, this may occasionally lead to misinterpreted names.
TI Title Set to the Title field of the entry.
SO Source name Set to the Publication field of the entry if present, else omitted.
PU Publisher Set to the Publisher field of the entry if present, else omitted.
DI Digital Object Identifier Set to the Digital Object Identifier of the paper, if known
SN Journal's ISSN Only available from some data sources
UR URL Set to the (hidden) ArticleURL field of the entry if present, else omitted.
PY Publication year Set to Year if the Year field of the entry is present, else omitted.
VL Journal volume Only available from some data sources
IS Journal issue Only available from some data sources
BP Start page Only available from some data sources
EP End page Only available from some data sources
AB Abstract Set to the Abstract field of the entry if available, else omitted
TC Times cited Set to the value of the Cites field if the entry has one or more citations, else omitted.
ER End of record Marks the end of the entry
EF End of file Marks the end of the file (only once, at the end of the file)

JSON format details

The JSON format is defined in a variety of locations; we have used json.org and Wikipedia.

Publish or Perish writes all JSON format files as plain text Unicode files, encoded as UTF-8 with the UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. The list of publications is then formatted as a JSON array with leading '[' (U+005B) and trailing ']' (U+005D); each individual result is a JSON object with leading '{' (U+007B) and trailing '}' (U+007D) containing the following fields, where applicable:

Field name Field type Used for Notes
type string Publication type Set to the type of publication, or left out if the type is unknown
title string Title Set to the Title field of the entry.
authors array Author names Sub-array consisting of the the Authors field of the entry, split into individual authors as individual JSON string items.
source string Source name Set to the Publication field of the entry if present, else omitted.
year integer Date Set to the Year field of the entry is present, else omitted.
volume integer Journal volume Only available from some data sources
issue integer Journal issue Only available from some data sources
startpage integer Start page Only available from some data sources
endpage integer End page Only available from some data sources
publisher string Publisher Set to the Publisher field of the entry if present, else omitted.
doi string DOI Set to the Digital Object Identifier of the paper, if known
issn string ISSN Only available from some data sources
abstract string Abstract Set to the Abstract field of the entry if available, else omitted
article_url string URL Set to the (hidden) ArticleURL field of the entry if present (the abstract or full text of the article)
citation_url string URL Set to the (hidden) CitationURL field of the entry if present (the full citation of the article)
cites_url string URL Set to the (hidden) CitesURL field of the entry if present (the URL to the list of citing articles)
cites integer Citation count Set to the Cites field of the entry
ecc integer Estimated citation
count
Estimated citation counts are provided separately by some data sources, for example Microsoft Academic. If not available, this field is set to the same value as the Cites field.
rank integer Result rank Set to the Rank field of the entry. This is simply the order in which the data source returned the results (1=first, 2=second, etc.). Typically, earlier ranked entries indicate more relevant query results.
use boolean In-use marker Represents the state of the check box in the results list
merged array Merged children Sub-array containing of any other entries that were merged into this entry; each one is again a JSON object as documented here.

JSON Lines format details

The JSON Lines format is similar to the plain JSON format (above), but writes each data record to a single line without comma separators between lines. This makes it more suitable for appending the results of multiple searches.

Publish or Perish writes the same data fields for JSON Lines as for the plain JSON output; it's just the absence of an overall JSON array ('[' ... ']') and the inter-record separation that is different. See the JSON Lines documentation web site for further details.

RefMan/RIS format details

The RIS data exchange format is defined by Thomson ResearchSoft for use by its Reference Manager and EndNote programs, among others; their web site contains the official definition.

Publish or Perish writes all RIS format files as plain text Unicode files, encoded as UTF-8 with the UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. There is no maximum line length; each field is written on a single line.

Publish or Perish uses the following subset of the available RIS tags:

Tag Used for Notes
TY Type of entry Set to publication type if known, else to JOUR if the Publication field is present, else to BOOK.
AU Author names Publish or Perish generates one AU line per author in the Authors field. It also reformats each author's name as LastName, Initials as required by the RIS specification. Because authors lists are not always 100% accurate, this may occasionally lead to misinterpreted names.
TI Title Set to the Title field of the entry.
JF Periodical name Set to the Publication field of the entry if present, else omitted.
PB Publisher Set to the Publisher field of the entry if present, else omitted.
DO Digital Object Identifier Set to the Digital Object Identifier of the paper, if known
SN Journal's ISSN Only available from some data sources
UR URL Set to the (hidden) ArticleURL field of the entry if present, else omitted.
PY Date Set to Year/// if the Year field of the entry is present, else omitted. The trailing "///" characters are required by the RIS date format specification.
VL Journal volume Only available from some data sources
IS Journal issue Only available from some data sources
SP Start page Only available from some data sources
EP End page Only available from some data sources
AB Abstract Set to the Abstract field of the entry if available, else omitted
M1 Miscellaneous Set to Query date: date in the format YYYY-MM-DD hh:mm:ss (ISO 8601 date and time).
M1 Miscellaneous Set to x cites: CitationsURL if the entry has one or more citations, else omitted.
N1 Notes Set to Cited By (since yyyy): x if the entry has one or more citations, else omitted. This is for compatibility with Scopus' export format.
ER End of record Marks the end of the entry
Versions of Publish or Perish prior to 4.0.1 used slightly different tags for some fields:
A1 Author names Used for each author in the Authors field. As of 4.0.1, the equivalent AU tag is used instead.
T1 Title Used for the Title field. As of 4.0.1, the equivalent TI tag is used instead.
T2 Secondary title Used for the Publication field. As of 4.0.1, the more appropriate JF tag is used instead.
Y1 Date Used for the Year field (as Year///). As of 4.0.1, the equivalent PY tag is used instead.