Preferences - Queries

This dialog box appears when you choose the Tools > Preferences command from the main menu. It allows you to edit a number of settings that affect the way Publish or Perish deals with queries. This dialog box contains the following fields and options.

General

This box contains general options relating to the way Publish or Perish issues queries to the query server (for example, Google Scholar).

Option Description
Keep cached results
for <n> days
Enter the number of days to keep the query results from queries. The longer this period, the fewer accesses are required to satisfy repeated queries. Any updates in the query results only become visible after the cache period has expired, so you don't want to make this period too long.
Clear the cache Click this button to clear the entire results cache. This forces subsequent queries to access Google Scholar directly, which might be useful after a (suspected) update on Google Scholar, or if you have reason to believe that the cached results are somehow invalid.
User-Agent string Set the HTTP User-Agent string that Publish or Perish uses to identify itself to Google Scholar and other query servers. The HTTP User-Agent identification that a client sends to an HTTP server identifies the client program and version. In many practical cases, the string contains more or less detailed information not only about the client program, but also about the operating system it's running on. For that reason, a typical User-Agent string might look like:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

For ease of configuration, the User-Agent string field may contain zero or more of the following:

  • Literal text, for example Mozilla/5.0
  • Replaceable field placeholders starting with %, for example %W

The replaceable field placeholders will be expanded when the Publish or Perish starts and include the following:

Placeholder Expands to
%% Single '%' character
%@ User-Agent string as used by Internet Explorer
%B Windows build number, for example 7601
%D Current date as YYYYMMDD, for example 20141023
%E Trident engine version number as major.minor, for example 7.0
%I Internet Explorer version number as major.minor, for example 11.0
%M Mozilla version number as major.minor, matches the pseudo-Mozilla version that IE would have returned; for example 5.0
%P Process identifier as decimal number, for example 384
%R Pseudo-random decimal number 0-65535, for example 34270
%T Current time as HHMMSS, for example 130425
%V Pseudo-random version number as major.minor, for example 6363.37629
%W Windows version number as major.minor, for example 6.1

Notes

  • The default User-Agent string as of Publish or Perish version 4.16 is %@ (which resolves to the same identification that Internet Explorer uses in compatibility mode).
  • Internet Explorer 11 uses a User-Agent string equivalent to Mozilla/%M (Windows NT %W; Trident/%E; rv:%I) like Gecko
  • Earlier versions of Publish or Perish used Mozilla/%M (compatible; MSIE %I; Windows NT %W)
  • If you leave the User-Agent string field empty, it defaults to Mozilla/%M (compatible; MSIE %I; Windows NT %W; Trident/%E)

Query aging

This box contains options that determine how Publish or Perish ages previously executed queries, as follows.

  1. When you execute a query, whether a new one or a repeat of an earlier one, the query is stored in the Recent queries Recent queries folder of the Multi-query center.
  2. Queries in the Recent queries Recent queries folder older than a preset number of days are automatically migrated to the Older queries Older queries folder.
  3. Queries in the Older queries Older queries folder older than a second preset number of days are automatically migrated to the Trash Trash folder.
  4. Finally, queries in the Trash Trash folder older than a third preset number of days are automatically deleted.

If at any point you re-execute an earlier query, it is moved back to the Recent queries Recent queries folder and its age is reset to zero.

The aging of queries only applies to queries that reside in the Recent queries Recent queries, Older queries Older queries, or Trash Trash folders. Queries that reside in other folders of the Multi-query center are not affected by the aging policies.

Option Description
Maximum age for Recent
queries <n> days
Enter the maximum age for "recent" queries. When a query is older than this number of days, it is moved automatically to the Older queries folder.
Maximum age for Older
queries <n> days
Enter the maximum age for "older" queries. When a query is older than this number of days, it is moved automatically to the Trash folder.
Delete Trash queries if
older than <n> days

Enter the number of days after which queries should be deleted from the Trash folder.

Tip: to avoid automatic deletion of queries, set this to a high number of days, for example 9999.

Request rate limiter

This box contains options that determine how Publish or Perish limits the rate at which requests are send to Google Scholar.

Requests are related to queries but are not the same:

The request rate limiter options help you to keep the number of requests sent to Google Scholar down to acceptable levels.

Option Description
Maximum request rate
<n> requests/minute

Enter the maximum number of requests per minute that Publish or Perish should send to Google Scholar. This is a short-term limit and only takes into account the request rate over the past 60 seconds.

If this limit is exceeded, then Publish or Perish will delay sending the next request until the request rate has fallen below the maximum that is set here.

Use adaptive request rate Check this box to slow down the request rate when Publish or Perish detects that the request rate approaches certain preset limits; clear this box to keep sending requests at the maximum rate. We recommend that you leave this option checked.
Respond to CAPTCHAs

Check this box to display a CAPTCHA dialog box when Google Scholar requests verification of your human status. If you solve the CAPTCHA correctly, then Google Scholar allows further queries. We recommend that you leave this option checked.

Note: For the CAPTCHA handling to be functional, you must allow first-party cookies in your Internet Explorer settings, or at least session cookies. You can set the Internet Explorer cookies preferences by choosing the Tools > Internet Options command from the main menu in Publish or Perish, then clicking on the Privacy tab in the Internet Properties dialog box that appears.

Show request rate warnings Check this box to display warnings when Publish or Perish detects that the request rate approaches certain preset limits; clear this box to suppress those warnings. We recommend that you leave this option checked.
Show Yellow warning if rate
exceeds <n> requests/hour

Enter the threshold for "Yellow" query rate warnings. This is a medium-term limit and takes into account the request rate over the past hour. The actual limit is an empirical value; we recommend setting this option to 120 or less.

This option is used for two purposes:

  • If the Use adaptive request rate option is checked, it acts as a threshold for the adaptive request rate
  • If the Show request rate warnings option is checked, exceeding this rate triggers a warning
Show Red warning if rate
exceeds <n> requests/hour

Enter the threshold for "Red" query rate warnings. This is a medium-term limit and takes into account the request rate over the past hour. The actual limit is an empirical value; we recommend setting this option to 150 or less.

This option is used for two purposes:

  • If the Use adaptive request rate option is checked, it acts as an upper limit for the adaptive medium-term request rate
  • If the Show request rate warnings option is checked, exceeding this rate triggers a warning