Preferences: Google Scholar

This dialog box appears when you choose the Tools > Preferences command from the main menu and go to the Google Scholar tab. It allows you to edit a number of settings that affect the way Publish or Perish deals with queries to Google Scholar. This dialog box contains the following fields and options.

Note: The options in this dialog box are intended for users who are familiar with data sources such as Google Scholar and client-server communication using the HTTP[S] protocol. If you are not, then you are strongly advised to leave these options at their default values unless specifically instructed otherwise.

General

This box contains general options relating to the way Publish or Perish issues queries to Google Scholar.

Option Description
Query URL Enter the URL that Publish or Perish should use to perform Google Scholar queries.
User-Agent string

Set the HTTP User-Agent string that Publish or Perish uses to identify itself to Google Scholar.

The HTTP User-Agent identification that a client sends to an HTTP server identifies the client program and version. In many practical cases, the string contains more or less detailed information not only about the client program, but also about the operating system it's running on. For that reason, a typical User-Agent string might look like:

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

For ease of configuration, the User-Agent string field may contain zero or more of the following:

  • Literal text, for example Mozilla/5.0
  • Replaceable field placeholders starting with %, for example %W

The replaceable field placeholders will be expanded when the Publish or Perish starts and include the following:

Placeholder Expands to
%% Single '%' character
%@ User-Agent string as used by Internet Explorer
%B Windows build number, for example 7601
%D Current date as YYYYMMDD, for example 20141023
%E Trident engine version number as major.minor, for example 7.0
%I Internet Explorer version number as major.minor, for example 11.0
%M Mozilla version number as major.minor, matches the pseudo-Mozilla version that IE would have returned; for example 5.0
%P Process identifier as decimal number, for example 384
%R Pseudo-random decimal number 0-65535, for example 34270
%T Current time as HHMMSS, for example 130425
%V Pseudo-random version number as major.minor, for example 6363.37629
%W Windows version number as major.minor, for example 6.1

Notes

  • The default User-Agent string as of Publish or Perish version 4.16 is %@ (which resolves to the same identification that Internet Explorer uses in compatibility mode).
  • Internet Explorer 11 uses a User-Agent string equivalent to Mozilla/%M (Windows NT %W; Trident/%E; rv:%I) like Gecko
  • Earlier versions of Publish or Perish used Mozilla/%M (compatible; MSIE %I; Windows NT %W)
  • If you leave the User-Agent string field empty, it defaults to Mozilla/%M (compatible; MSIE %I; Windows NT %W; Trident/%E)

Request rate limiter

This box contains options that determine how Publish or Perish limits the rate at which requests are send to Google Scholar.

Requests are related to queries but are not the same:

  • Each query translates to one or more requests for results that are potentially sent to Google Scholar. Currently, each request returns up to 20 results, so a single query that returns, say, 150 results in all requires 8 individual requests (7 x 20 full + 1 x 10 partial).
  • A request may be satisfied from Publish or Perish's own cache, in which case the request is not sent to Google Scholar (unless you use Lookup Direct).
  • If you sent too many requests to Google Scholar or if the requests follow each other too quickly, Google Scholar may block further requests.

The request rate limiter options help you to keep the number of requests sent to Google Scholar down to acceptable levels.

Option Description
Maximum number of
results per query

Enter the maximum number of results that Publish or Perish should retrieve for each query. Google Scholar never returns more than 1000 results (and usually returns fewer), but you can set a lower maximum if you are only interested in the most relevant results for each query. This also reduces the total number of requests that Publish or Perish sends to Google Scholar.

Maximum request rate
<n> requests/minute

Enter the maximum number of requests per minute that Publish or Perish should send to Google Scholar. This is a short-term limit and only takes into account the request rate over the past 60 seconds.

If this limit is exceeded, then Publish or Perish will delay sending the next request until the request rate has fallen below the maximum that is set here.

Use adaptive request rate Check this box to slow down the request rate when Publish or Perish detects that the request rate approaches certain preset limits; clear this box to keep sending requests at the maximum rate. We recommend that you leave this option checked.
Respond to CAPTCHAs

Check this box to display a CAPTCHA dialog box when Google Scholar requests verification of your human status. If you solve the CAPTCHA correctly, then Google Scholar allows further queries. We recommend that you leave this option checked.

Note: For the CAPTCHA handling to be functional, you must allow first-party cookies in your Internet Explorer settings, or at least session cookies. You can set the Internet Explorer cookies preferences by choosing the Tools > Internet Options command from the main menu in Publish or Perish, then clicking on the Privacy tab in the Internet Properties dialog box that appears.

Show request rate warnings Check this box to display warnings when Publish or Perish detects that the request rate approaches certain preset limits; clear this box to suppress those warnings. We recommend that you leave this option checked.
Show Yellow warning if rate
exceeds <n> requests/hour

Enter the threshold for "Yellow" query rate warnings. This is a medium-term limit and takes into account the request rate over the past hour. The actual limit is an empirical value; we recommend setting this option to 120 or less.

This option is used for two purposes:

  • If the Use adaptive request rate option is checked, it acts as a threshold for the adaptive request rate
  • If the Show request rate warnings option is checked, exceeding this rate triggers a warning
Show Red warning if rate
exceeds <n> requests/hour

Enter the threshold for "Red" query rate warnings. This is a medium-term limit and takes into account the request rate over the past hour. The actual limit is an empirical value; we recommend setting this option to 150 or less.

This option is used for two purposes:

  • If the Use adaptive request rate option is checked, it acts as an upper limit for the adaptive medium-term request rate
  • If the Show request rate warnings option is checked, exceeding this rate triggers a warning
Keep cached results
for <n> days

Enter the number of days to keep the query results from queries. The longer this period, the fewer accesses are required to satisfy repeated queries. Any updates in the query results only become visible after the cache period has expired, so you don't want to make this period too long.

Other options

The following additional options are available in the form of push buttons:

Option Description
Reset to defaults Click this button to reset all fields to their "factory" defaults. This is useful if you have made changes, but decide to start from a known base again.
Clear the cache Click this button to clear the entire results cache. This forces subsequent queries to access Google Scholar directly, which might be useful after a (suspected) update on Google Scholar, or if you have reason to believe that the cached results are somehow invalid.
Show statistics Click this button to open the Data provider information dialog box with current statistics and settings for the underlying data provider (i.e., Google Scholar).

Generated by Cphyl 3.21.0.6260 (2017.02.19.1015A)