Overview
The Google WebSearch service enables Google Site Search
customers to display Google search results on their own web sites. The
WebSearch service uses a simple HTTP-based protocol to serve search
results. Search administrators have complete control over the way they
request search results and the way they present those results to the
end user. This document describes the technical details of the Google
search request and results formats.
To retrieve Google WebSearch results, your application sends
Google a simple HTTP request. Google then returns search results in XML
format. XML-formatted results give you the ability to customize
the way search results are displayed.
WebSearch Request Format
Request Overview
The Google search request is a standard HTTP GET command. It
includes a collection of parameters relevant to your
queries. These parameters are included in the request URL as name=value
pairs separated by ampersand (&) characters. Parameters include
data like the search query and a unique CSE ID (cx) that identifies the
CSE that is making the HTTP request. The WebSearch service returns
XML results in response to your HTTP requests.
Query Terms
Most search requests include one or more query terms. A query term
appears as the value of a parameter in the search request.
Query terms can specify several types of information to filter and
organize the search results that Google returns. Queries can specify:
- Words or phrases to include or
exclude
- All of the words in a search query (default)
- An exact phrase in the search query
- Any word or phrase in a search query
- Where in a document to look for the
search terms
- Anywhere in the document (default)
- Only in the body of the document
- Only in the document title
- Only in the document URL
- Only in links in the document
- Restrictions on the documents themselves
- Including or excluding documents of particular file types
(such as PDF files or Word documents)
- Special URL queries that return
information about a given URL, rather than doing a search
- Queries that return general information about a URL, such as
its Open Directory category, snippet or language
- Queries that return the set of web pages that link to a URL
- Queries that return a set of web pages similar to a given URL
Default Search
Search query parameter values must be URL-escaped. Note that you
would substitute the plus sign ("+") for any whitespace sequences in
the search query. This is discussed further in the URL Escaping section of this document.
The search query term is submitted to the WebSearch service using
the q parameter. A
sample search query term is:
q=horses+cows+pigs
By default, the Google WebSearch service only returns documents that
include all of the terms in the search query.
Request Parameters
This section lists the parameters that you can use when making a
search request. The parameters are split into two lists. The first list
contains parameters that are relevant to all search requests. The
second list contains parameters that
are
only relevant to advanced
search requests.
There are three request parameters that are required:
- The client parameter must be set to
google-csbe
- The output parameter specifies the
format of the returned XML results; results can be returned with (xml)
or without (xml_no_dtd) a reference to Google's DTD. We recommend
setting this value to xml_no_dtd. Note:
If you do not specify this parameter, then results will be returned in
HTML instead of XML.
- The cx parameter which represents the unique
ID of the CSE.
The most commonly used request parameters other than the ones
mentioned above are:
- num — the requested number of search results
- q — the search term(s)
- start — the starting index for the results
Sample WebSearch Queries
The examples below show a couple of WebSearch HTTP requests to
illustrate how different query parameters are used. Definitions for the
different query parameters are provided in the WebSearch Query
Parameter Definitions and the Advanced Search Query
Parameters sections of this document.
This request asks for the first 10 results (start=0&num=10)
for the query term "red sox" (q=red+sox). The query also
specifies that results should come from Canadian web sites (cr=countryCA)
and should be written in French (lr=lang_fr). Finally, the query
specifies values for the client , output and cx parameters,
all three of which are
required.
http://www.google.com/search?
start=0
&num=10
&q=red+sox
&cr=countryCA
&lr=lang_fr
&client=google-csbe
&output=xml_no_dtd
&cx=00255077836266642015:u-scht7a-8i
This example uses some of the advanced search query
parameters to further customize the search query. This request uses
the as_q parameter (as_q=red+sox) instead
of the q parameter. It also uses the as_eq parameter to exclude any documents
containing the word "Yankees" from the search results (as_eq=yankees).
http://www.google.com/search?
start=0
&num=10
&as_q=red+sox
&as_eq=Yankees
&client=google-csbe
&output=xml_no_dtd
&cx=00255077836266642015:u-scht7a-8i
WebSearch Query Parameter Definitions
| Description |
Optional. The c2coff
parameter enables or disables the Simplified
and Traditional Chinese Search feature.
The default value for this parameter is "0" (zero), meaning
that the feature is enabled. Values for the c2coff
parameter are:
| Value |
Action |
| 1 |
Disabled |
| 0 |
Enabled |
|
| Examples |
q=google&c2coff=1 |
| Description |
Required. The client parameter must be set to
google-csbe.
|
| Examples |
q=google&client=google-csbe
|
| Description |
Optional. The cr
parameter restricts search results to documents originating in a
particular country. You may use Boolean
operators in the cr parameter's
value.
Google WebSearch determines the country of a document by
analyzing:
- the top-level domain (TLD) of the document's URL
- the geographic location of the Web server's IP address
See the Country (cr) Parameter
Values section for a list of valid values for this parameter.
|
| Examples |
q=Frodo&cr=countryNZ |
| Description |
Required. The cx parameter specifies a unique
code that identifies a custom
search engine. You must specify a Custom Search Engine using the cx
parameter to retrieve search results from that CSE.
To find the value of the cx parameter, go to Control Panel > Codes
tab of your CSE and you will find it in the text area that is right
under 'Paste this code in the page where you'd like your search box to
appear. The search results will be shown on a Google-hosted page.'
|
| Examples |
q=Frodo&cx=00255077836266642015:u-scht7a-8i |
| Description |
Optional. The filter parameter activates or
deactivates the automatic filtering of Google search results. See the Automatic Filtering section of this
document for more information about Google's search results filters.
The default value for the filter parameter is 1,
which indicates that the feature is enabled. Valid values for this
parameter are:
| Value |
Action |
| 1 |
Enabled |
| 0 |
Disabled |
Note: By default, Google applies filtering to all
search results to improve the quality of those results.
|
| Examples |
q=google&filter=0 |
| Description |
Optional. The gl parameter value is a
two-letter country code. For WebSearch results, the gl
parameter boosts search results whose country of origin matches the
parameter value. See the Country Codes
section for a list of valid values.
Specifying a gl parameter value in WebSearch requests
should lead to increased relevance of results. This is particularly
true for international customers and, even more specifically, for
customers in English-speaking countries other than the United States.
|
| Examples |
This request boosts documents written in the United Kingdom in
WebSearch results:
q=pizza&gl=uk
|
| Description |
Optional. The hl parameter specifies the
interface language (host language) of your user interface. To improve
the performance and the quality of your search results, you are
strongly encouraged to set this parameter explicitly.
See the Interface Languages
section of Internationalizing Queries
and Results Presentation for more information and Supported Interface Languages for a
list of supported languages.
|
| Examples |
This request targets ads for wine in French. (Vin is
the French term for wine.)
q=vin&ip=10.10.10.10&ad=w5&hl=fr |
| Description |
Optional. The ie parameter sets the character
encoding scheme that should be used to interpret the query string. The
default ie value is latin1.
See the Character Encoding
section for a discussion of when you might need to use this parameter.
See the Character Encoding
Schemes section for the list of possible ie values.
|
| Examples |
q=google&ie=utf8&oe=utf8
|
| Description |
Optional. The lr
(language restrict) parameter restricts search results to documents
written in a particular language.
Google WebSearch determines the language of a document by
analyzing:
- the top-level domain (TLD) of the document's URL
- language meta tags within the document
- the primary language used in the body text of the document
See the Language (lr)
Collection Values section for a list of valid values for this
parameter.
|
| Examples |
q=Frodo&lr=lang_en
|
| Description |
Optional. The num parameter identifies the
number of search results to return.
The default num value is 10, and the maximum
value is 20. If you request more than 20 results, only 20
results will be returned.
Note: If the total number of search results is less
than the requested number of results, all available search results will
be returned.
|
| Examples |
q=google&num=10 |
| Description |
Optional. The oe parameter sets the character
encoding scheme that should be used to decode the XML result. The
default oe value is latin1.
See the Character Encoding
section for a discussion of when you might need to use this parameter.
See the Character Encoding
Schemes section for the list of possible oe values.
|
| Examples |
q=google&ie=utf8&oe=utf8
|
| Description |
Required. The output
parameter specifies the format of the XML results. The only valid
values for this parameter are xml and xml_no_dtd. The
chart below explains how these parameter values differ.
| Value |
Output Format |
| xml_no_dtd |
The XML results will not include a
!DOCTYPE statement. (Recommended)
|
| xml |
The XML results will contain a Google
DTD reference. The second line of the result will identify the document
definition type (DTD) that the results use:
<!DOCTYPE GSP SYSTEM "google.dtd">
|
|
| Examples |
output=xml_no_dtd
output=xml |
| Description |
Optional. The q parameter specifies the search
query entered by the user. Even though this parameter is optional, you
must specify a value for at least one of the query parameters (as_epq, as_lq, as_oq, as_q, as_rq)
to get search results.
There are also a number of special query terms that can be
used as part of the q parameter's
value. Please see Special Query Terms
for a list and definitions of these terms.
The Google Search Admin Console includes a report of the top
queries submitted using the q parameter.
Note: The value specified for the q parameter
must be URL-escaped.
|
| Examples |
q=vacation&as_oq=london+paris
|
| Description |
Optional. The safe
parameter indicates how search results should be filtered for adult and
pornographic content. The default value for the safe
parameter is off. Valid parameter values
are:
| Value |
Action |
| off |
Disable SafeSearch |
| medium |
Enable SafeSearch |
| high |
Enable a stricter version of SafeSearch |
See the Filtering Adult Content
with SafeSearch section for more details about this feature.
|
| Examples |
q=adult&safe=high |
| Description |
Optional. The start
parameter indicates the first matching result that should be included
in the search results. The start
parameter uses a zero-based index, meaning the first result is 0, the
second result is 1 and so forth.
The start parameter works in
conjunction with the num parameter to determine
which search results to return.
|
| Examples |
start=10 |
| Description |
Optional. The ud
parameter indicates whether the XML response should include the
IDN-encoded URL for the search result. IDN (International Domain Name)
encoding allows domains to be displayed using local languages:
http://www.花井鮨.
com
Valid values for this parameter are 1,
meaning the XML result should include IDN-encoded URLs, and 0, meaning the XML result should not include
IDN-encoded URLs. The default value for this parameter is 0. If the ud
parameter is set to 1, the IDN-encoded
URL will appear in the UD
tag in your XML results.
If the ud parameter is set to 0, the URL in the example above would be
displayed as: http://www.xn--elq438j.com.
Note: This is a beta feature.
|
| Examples |
q=google&ud=1 |
Advanced Search
The additional query parameters listed below the image are relevant
to advanced search queries. When you submit an advanced search, the
values of several parameters (e.g. as_eq, as_epq,
as_oq, etc.) are all factored into
the query terms for that search. The image sows Google's Advanced
Search page. On the image, the name of each advanced search parameter
is written in red text inside of or next
to the field on the page to which that parameter corresponds.
Advanced Search Query Parameters
| Description |
Optional. The as_epq parameter identifies a
phrase that all documents in the search results must contain. You can
also use the phrase search query term to
search for a phrase.
|
| Examples |
as_epq=abraham+lincoln |
| Description |
Optional. The as_eq parameter identifies a word
or phrase that should not appear in any documents in the search
results. You can also use the exclude query
term to ensure that a particular word or phrase will not appear in the
documents in a set of search results.
|
| Examples |
This example shows a search for the word "bass", where all
pages in the search results do not contain the word "music":
q=bass&as_eq=music
|
| Description |
Optional. The as_lq parameter specifies that
all search results should contain a link to a particular URL. You can
also use the link: query term for this type
of query.
|
| Examples |
This example shows a search for pages that link to www.google.com:
as_lq=www.google.com
|
| Description |
Optional. The as_oq parameter provides
additional search terms to check for in a document, where each document
in the search results must contain at least one of the additional
search terms. You can also use the Boolean OR
query term for this type of query.
|
| Examples |
This example shows a search for pages that contain the word
"vacation" and either the word "London" or "Paris":
q=vacation&as_oq=London+Paris
|
| Description |
Optional. The as_q parameter provides search
terms to check for in a document. This parameter is also commonly used
to allow users to specify additional terms to search for within a set
of search results.
|
| Examples |
This example shows a search for pages containing the word
"president" that also contain the words "John Adams":
q=president&as_q=John+Adams
|
| Description |
Optional. The as_rq parameter specifies that
all search results should be pages that are related to the specified
URL. The parameter value should be a URL. You can also use the related: query term for this type of query.
|
| Examples |
This example shows a search for pages that are related to www.google.com:
as_rq=www.google.com
|
Special Query Terms
Google WebSearch allows the use of several special query terms that
access additional capabilities of the Google search engine. These
special query terms should be included in the value of the q request parameter. Like other query terms, the
special query terms must be URL-escaped. A
number of the special query terms contain a colon (:). This character
must also be URL-escaped; its URL-escaped value is %3A.
| Description |
The link: query term retrieves the set of Web pages
that link to a particular URL. The search query should be formatted as link:URL
with no space between the link: query term and the URL.
The URL-escaped version of link: is link%3A.
You can also use the as_lq request
parameter to submit a link: request.
Note: You cannot specify any other query terms when
using link:.
|
| Examples |
This example supposes the user is looking for sites that link
to www.example.com, in which case
the user-entered search term would be link:www.example.com.
http://www.google.com/search?q=link%3Awww.example.com
|
| Description |
The OR query term retrieves documents that include one
of a series of (two or more) query terms. To use the OR query
term, you would insert the search term OR, in uppercase
letters, between each term in the series.
You can also use the as_oq request
parameter to submit a search for any term in a set of terms.
Note: If a search request
specifies the query "London+OR+Paris", the search results will include
documents containing at least one of those two words. In some cases,
documents in the search results may contain both words.
|
| Examples |
Search for London or Paris:
User input: london OR
paris
Query term: q=london+OR+paris
Search for vacation and either London or Paris
Query term: q=vacation+london+OR+paris
Search for vacation and one of London, Paris or chocolates:
Query term: q=vacation+london+OR+paris+OR+chocolates
Search for vacation and chocolates and either london or paris,
with the least weight being given to chocolates:
Query term: q=vacation+london+OR+paris+chocolates
Search for vacation, chocolates and flowers in documents that
also contain either London or Paris:
Query term: q=vacation+london+OR+paris+chocolates+flowers
Search for vacation and one of London or Paris and also search
for one of chocolates or flowers:
Query term: q=vacation+london+OR+paris+chocolates+OR+flowers
|
| Description |
The exclude (-) query term restricts results for a
particular search request to documents that do not contain a
particular word or phrase. To use the exclude query term, you would
preface the word or phrase to be excluded from the matching documents
with "-" (a minus sign).
The URL-escaped version of - is %2D.
The exclude query term is useful when a search term has more
than one meaning. For example, the word "bass" could return results
about either fish or music. If you were looking for documents about
fish, you could exclude documents about music from your search results
by using the exclude query term.
You can also use the as_eq request
parameter to exclude documents matching a particular word or phrase
from search results.
|
| Examples |
User input: bass -music
Query term: q=bass+%2Dmusic |
| Description |
The -filetype: query term excludes documents with a
particular file extension, such as ".pdf" or ".doc" from search
results. The search query should be formatted as -filetype:EXTENSION
with no space between the -filetype: query term and the
specified extension.
The URL-escaped version of -filetype: is %2Dfiletype%3A.
Note: You can exclude multiple
file types from search results by adding more -filetype: query
terms to your query. You should have one -filetype: query term
in your search query for each file extension that should be excluded
from the search results.
Filetypes supported by Google include:
- Adobe Portable Document Format (pdf)
- Adobe PostScript (ps)
- Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
- Lotus WordPro (lwp)
- Macwrite (mw)
- Microsoft Excel (xls)
- Microsoft PowerPoint (ppt)
- Microsoft Word (doc)
- Microsoft Works (wks, wps, wdb)
- Microsoft Write (wri)
- Rich Text Format (rtf)
- Shockwave Flash (swf)
- Text (ans, txt).
Additional filetypes may be added in the future. An up-to-date
list can always be found in Google's file type FAQ.
|
| Examples |
This example returns documents that mention "Google" but that
are not PDF documents:
q=Google+%2Dfiletype%3Apdf
This example returns documents that mention "Google" but
excludes both PDF and Word documents:
q=Google+%2Dfiletype%3Apdf+%2Dfiletype%3Adoc
|
| Description |
The filetype: query term restricts search results to
documents with a particular file extension, such as ".pdf" or ".doc".
The search query should be formatted as filetype:EXTENSION with
no space between the filetype: query term and the specified
extension.
The URL-escaped version of filetype: is filetype%3A.
Note: You can restrict search
results to documents matching one of several file extensions by adding
more filetype: query terms to your query. You should have one filetype:
query term in your search query for each file extension that should be
included in the search results. Multiple filetype: query terms
must be separated using the OR query term.
By default, search results will
include documents with any file extension.
Filetypes supported by Google include:
- Adobe Portable Document Format (pdf)
- Adobe PostScript (ps)
- Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
- Lotus WordPro (lwp)
- Macwrite (mw)
- Microsoft Excel (xls)
- Microsoft PowerPoint (ppt)
- Microsoft Word (doc)
- Microsoft Works (wks, wps, wdb)
- Microsoft Write (wri)
- Rich Text Format (rtf)
- Shockwave Flash (swf)
- Text (ans, txt).
Additional filetypes may be added in the future. An up-to-date
list can always be found in Google's file type FAQ.
|
| Examples |
This example returns PDF documents that mention "Google":
q=Google+filetype%3Apdf
This example returns PDF and Word documents that mention
"Google":
q=Google+filetype%3Apdf+OR+filetype%3Adoc
|
| Description |
The include (+) query term specifies that a word or phrase
must occur in all documents included in the search results. To use the
include query term, you would preface the word or phrase that must be
included in all search results with "+" (a plus sign).
The URL-escaped version of + (a plus sign) is %2B.
You should use + before a common word that Google
normally discards before identifying search results.
|
| Examples |
User input: Star Wars Episode +I
Query term: q=Star+Wars+Episode+%2BI |
| Description |
The allinlinks: query term requires documents in
search results to contain all of the words in the search query in URL
links. The search query should be formatted as allinlinks:
followed by the words in your search query.
If your search query includes the allinlinks: query
term, Google will only check the URL links in documents for the words
in your search query, ignoring other text in the documents, the
document titles and the URLs of each document. Note that the document
URL is different from the URL links contained in the document.
The URL-escaped version of allinlinks: is allinlinks%3A.
|
| Examples |
User input: allinlinks: Google search
Query term: q=allinlinks%3A+Google+search |
| Description |
The phrase search (") query term allows you to search for
complete phrases by enclosing the phrases in quotation marks or by
connecting them with hyphens.
The URL-escaped version of " (a quotation mark) is %22.
Phrase searches are particularly useful if you are searching
for famous quotes or proper names.
You can also use the as_epq request
parameter to submit a phrase search.
|
| Examples |
User input: "Abraham Lincoln"
Query term: q=%22Abraham+Lincoln%22 |
| Description |
The related: query term retrieves a set of Web pages
that are similar to a particular URL. The search query should be
formatted as related:URL with no space between the related:
query term and the URL.
The URL-escaped version of related: is related%3A.
You can also use the as_rq request
parameter to submit a related: request.
Note: You cannot specify any other query terms when
using related:.
|
| Examples |
This example supposes the user is looking for sites that are
similar to www.example.com, in
which case the user-entered search term would be related:www.example.com.
http://www.google.com/search?q=related%3Awww.example.com
|
| Description |
The allintext: query term requires each document in
the search results to contain all of the words in the search query in
the body of the document. The query should be formatted as allintext:
followed by the words in your search query.
If your search query includes the allintext: query
term, Google will only check the body text of documents for the words
in your search query, ignoring links in those documents, document
titles and document URLs.
The URL-escaped version of allintext: is allintext%3A.
|
| Examples |
This example specifies that the words
"Google" and "search" must appear in the body of all documents included
in the search results:
User input: allintext:Google search
Query term: q=allintext%3AGoogle+search |
| Description |
The intitle: query term restricts search results to
documents that contain a particular word in the document title. The
search query should be formatted as intitle:WORD with no space
between the intitle: query term and the following word.
Note: You can specify more
than one word that must be included in the document title by putting
the intitle: query term in front of each such word. You can
also use the allintitle: query term to
specify that all query words must be included in the titles of
documents that are in the search results.
The URL-escaped version of intitle: is intitle%3A.
|
| Examples |
This example specifies that the word "Google" must appear in
the titles of any documents in the search results, and the word
"search" must appear anywhere in the titles, URLs, links or body text
of those documents:
User input: intitle:Google search
Query term: q=intitle%3AGoogle+search
|
| Description |
The allintitle: query term restricts search results to
documents that contain all of the query words in the document title. To
use the allintitle: query term, include "allintitle:" at the
start of your search query.
Note: Putting allintitle:
at the beginning of a search query is equivalent to putting intitle: in front of each word in the search
query.
The URL-escaped version of allintitle: is allintitle%3A.
|
| Examples |
This example specifies that the words "Google" and "search"
must appear in the titles of any documents in the search results:
User input: allintitle: Google search
Query term: q=allintitle%3A+Google+search
|
| Description |
The inurl: query term restricts search results to
documents that contain a particular word in the document URL. The
search query should be formatted as inurl:WORD with no space
between the inurl: query term and the following word.
Note: The inurl: query
term ignores punctuation and uses only the first word following the inurl:
operator. You can specify more than one word that must be included in
the document URL by putting the inurl: query term in front of
each such word. You can also use the allinurl:
query term to specify that all query words must be included in the URLs
of documents that are in the search results.
The URL-escaped version of inurl: is inurl%3A.
|
| Examples |
This example specifies that the word "Google" must appear in
the URLs of any documents in the search results, and the word "search"
must appear anywhere in the titles, URLs, links or body text of those
documents:
User input: inurl:Google search
Query term: q=inurl%3AGoogle+search
|
| Description |
The allinurl: query term restricts search results to
documents that contain all of the query words in the document URL. To
use the allinurl: query term, include "allinurl:" at the start
of your search query.
Note: The allinurl:
query term ignores punctuation, so it works only on words, not on URL
components. For example, "allinurl: uk/scotland" will restrict results
to documents that contain the words "uk" and "scotland" in their URLs,
but will not require that those two words appear in any particular
order or that they be separated by a slash.
The URL-escaped version of allinurl: is allinurl%3A.
|
| Examples |
This example specifies that the words "Google" and "search"
must appear in the URLs of any documents in the search results:
User input: allinurl: Google search
Query term: q=allinurl%3A+Google+search
|
| Description |
The info: query term retrieves general information
about a URL as long as that URL is included in Google's search index.
The search query should be formatted as info:URL with no space
between the info: query term and the URL.
The URL-escaped version of info: is info%3A.
Note: You cannot specify any other query terms when
using info:.
|
| Examples |
User input: info:www.google.com
Query term: q=info%3Awww.google.com |
Request Limits
The chart below lists limitations on the search requests that you
send to Google:
| Component |
Limit |
Comment |
| Search request length |
2048 bytes |
|
| Number of query terms |
10 |
includes terms in the following parameters: q, as_epq, as_eq,
as_lq, as_oq, as_q, as_rq |
| Number of results |
20 |
If you set the num
parameter to a number greater than 20, only 20 results are returned. To
get more results, you would need to send multiple requests and
increment the value of the start parameter with
each request. |
Internationalizing Queries and Results Presentation
The Google WebSearch service enables you to search for documents in
multiple languages. You can specify the character encoding that should
be used to interpret your HTTP request and to encode your XML response
(using the ie and oe search
parameters). You can also filter results to only include documents
written in certain languages.
The following sections discuss issues related to searching in
multiple languages:
Character Encoding
Servers send data, such as web pages, to user agents, such as
browsers, as a sequence of encoded bytes. The user agent then decodes
the bytes into a sequence of characters. When sending requests to the
WebSearch service, you can specify the encoding schemes for both your
search query and for the XML response that you receive.
You can use the ie request parameter to specify
the encoding mechanism for the characters in your HTTP request. You can
also use the oe parameter to specify the encoding
scheme that Google should use to encode your XML response. If you are
using an encoding scheme other than ISO-8859-1
(or latin1), please ensure that you specify the correct values
for the ie and oe parameters.
Note: If you are providing search functionality for multiple
languages, we recommend you use the utf8 (UTF-8) encoding value
for both the ie and oe
parameters.
Please refer to the Character
Encoding Schemes appendix for a complete list of the values that
you can use for the ie and oe
parameters.
For more general information about character encoding, please see http://www.w3.org/TR/REC-html40/charset.html.
Interface Languages
You can use the hl request parameter to
identify the language of your graphical interface. The hl
parameter value may affect XML search results, especially on
international queries when language restriction (using the lr parameter) is not explicitly specified. In such
cases, the hl parameter may promote search results
in the same language as the user's input language.
We suggest you explicitly set the hl parameter
in search results to ensure that Google selects the highest quality
search results for each query.
Please see the Supported Interface
Languages section for a complete list of valid values for the hl parameter.
Searching for Documents Written in Specific Languages
You can use the lr request parameter to
restrict search results to documents that are written in a particular
language or set of languages.
The lr parameter supports Boolean Operators to allow you to specify
multiple languages that should be included (or excluded) from search
results.
The following examples show how you might use Boolean Operators to request documents in
different languages:
Documents written in French: lr=lang_fr
Documents written in French or English: lr=lang_fr|lang_en
Documents not written in Hungarian or Czech: lr=(-lang_hu).(-lang_cs)
Please see the Language Collection
Values section for a complete list of possible values for the lr parameter and the Boolean
Operators section for a complete discussion of the use of these
operators.
Simplified and Traditional Chinese Search
Simplified Chinese and Traditional Chinese are two writing variants
of the Chinese language. The same concept may be written differently in
each variant. Given a query in one of the variants, the Google
WebSearch service can return results that include pages in both
variants.
To use this feature:
- Set the c2coff request parameter to 0
and
- Do one of the following:
- Do not set the lr request parameter
or
- Set the lr request parameter to lr=lang_zh-TW|lang_zh-CN
The following example shows the query parameters you would include
in a request for results in both simplified and traditional Chinese.
(Note that additional required information, such as the client, is not included in the example.)
search?hl=zh-CN
&lr=lang_zh-TW|lang_zh-CN
&c2coff=0
Filtering Results
Google WebSearch provides a number of ways to filter your search
results:
Automatic Filtering of Search Results
In an effort to provide the best search results possible, Google
uses two techniques to automatically filter search results that are
generally considered undesirable:
-
Duplicate Content — If multiple documents contain the
same information, then only the most relevant document of that set is
included in your search results.
-
Host Crowding — If there are many search results from the
same site, Google may not show all the results from that site or may
show the results lower in
the ranking than they otherwise would have been.
We recommend you leave these filters on for typical search requests
because the filters significantly enhance the quality of most search
results. However, you can bypass these automatic filters by setting the
filter query parameter to 0 in your
search request.
Language and Country Filtering
The Google WebSearch service returns results from a master index of
all Web documents. The master index contains subcollections of
documents that are grouped by particular attributes, including language
and country of origin.
You can use the lr and cr
request parameters to restrict search results to subcollections of
documents that are written in particular languages or originate from
particular countries, respectively.
Google WebSearch determines the language of a document by analyzing:
- the top-level domain (TLD) of the document's URL
- language meta tags within the document
- the primary language used in the body text of the document
Please also see the definition of the lr
parameter, the section on Searching for
Documents Written in Specific Languages and the Language Collection Values that can be
used as values for the lr parameter for more
information on restricting results based on language.
Google WebSearch determines the country of a document by analyzing:
- the top-level domain (TLD) of the document's URL
- the geographic location of the Web server's IP address
Please also see the definition of the cr
parameter and the Country Collection
Values that can be used as values for the cr
parameter for more information on restricting results by country of
origin.
Note: You can combine language
values and country values to customize your search results. For
example, you could request documents that are written in French and
come from France or Canada, or you could request documents that come
from Holland and are not written in English. The lr
and cr parameters both support Boolean Operators.
Filtering Adult Content with SafeSearch
Many Google customers do not want to display search results for
sites
that contain adult content. Using our SafeSearch filter, you can screen
for search results that contain adult content and eliminate them.
Google's filters use proprietary technology to check keywords, phrases
and URLs. While no filters are 100 percent accurate, SafeSearch will
remove the overwhelming majority of adult content from your search
results.
Google strives to keep SafeSearch as current and comprehensive as
possible by continually crawling the Web and by incorporating updates
from user suggestions.
SafeSearch is available in the following languages:
Dutch
English
French
German |
Italian
Portuguese (Brazilian)
Spanish
Traditional Chinese |
You can adjust the degree to which Google filters your results for
adult content using the safe query parameter.
The following table explains Google's SafeSearch settings and how those
settings will affect your search results:
| SafeSearch Level |
Description |
| high |
Enables a stricter version
of safe search. |
| medium |
Blocks web pages containing
pornography and other explicit sexual content. |
| off |
Does not filter adult
content from search results. |
* The default SafeSearch setting is
off.
If you have SafeSearch activated and you find sites that contain
offensive content in your results, please email the site's URL to safesearch@google.com, and we
will investigate the site.
XML Results
Google XML Results DTD
Google uses the same DTD to describe the XML format for all types of
search results. Many of the tags and attributes are applicable for all
search types. Some tags, however, are applicable only for certain
search types. Consequently, the definitions in the DTD may be less
restrictive than the definitions given in this document.
This document describes those aspects of the DTD that are relevant
for WebSearch. When you look at the DTD, if you're working on
WebSearch, you can safely ignore tags and attributes that are not
documented here. If the definition differs between the DTD and the
documentation, that fact is noted in this document.
Google can return XML results either with or without a reference to
the most recent DTD. The DTD is a guide to help search administrators
and XML parsers understand Google's XML results. Because Google's XML
grammar may change from time to time, you should not configure your
parser to use the DTD to validate each XML result.
Additionally, you should not configure your XML parser to fetch the
DTD each time you submit a search request. Google updates the DTD
infrequently, and these requests create unnecessary delay and bandwidth
requirements.
Google recommends that you use the xml_no_dtd output format
to get XML results. If you specify the xml output
format in your search request, the only difference is the inclusion of
the following line in the XML results:
<!DOCTYPE GSP SYSTEM "google.dtd">
You can access the latest DTD at http://www.google.com/google.dtd.
Please note that not
all features in the DTD may be available or supported at this time.
About the XML Response
-
All element values are valid HTML suitable for display unless
otherwise noted in the XML tag definitions.
-
Some element values are URLs that need to be HTML-encoded before
they are displayed.
-
Your XML parser should ignore undocumented attributes and tags.
This allows your application to continue working without modification
if Google adds more features to the XML output.
-
Certain characters must be escaped when included as values in
XML tags. Your XML processor should convert these entities back to the
appropriate characters. If you do not convert entities properly, the
browser may, for example, render the & character as "&".
The XML
Standard documents these characters; these characters are
reproduced in the table below:
| Character |
Escaped Forms |
| Entity |
Character Code |
| Ampersand |
& |
& |
& |
| Single Quote |
' |
' |
' |
| Double Quote |
" |
" |
" |
| Greater Than |
> |
> |
> |
| Less Than |
< |
< |
< |
XML Results for Regular and Advanced Search Queries
Regular/Advanced Search: Sample Query and XML Result
This sample WebSearch request asks for 10 results (num=10)
about the search term "socer" (q=socer). (Note that the word
"soccer" is spelled wrong in the example.)
http://www.google.com/search?
q=socer
&hl=en
&start=10
&num=10
&output=xml
&client=google-csbe
&cx=00255077836266642015:u-scht7a-8i
This request yields the XML result below. Note that there are
several comments in the XML result to indicate where certain tags not
included in the result would appear.
<?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?>
<GSP VER="3.2">
<TM>0.452923</TM>
<Q>socer</Q>
<PARAM name="cx" value="00255077836266642015:u-scht7a-8i" original_value="00255077836266642015%3Au-scht7a-8i"/>
<PARAM name="hl" value="en" original_value="en"/>
<PARAM name="q" value="socer" original_value="socer"/>
<PARAM name="output" value="xml" original_value="xml"/>
<PARAM name="client" value="google-csbe" original_value="google-csbe"/>
<PARAM name="num" value="10" original_value="10"/>
<Spelling>
<Suggestion q="soccer"><b><i>soccer</i></b></Suggestion>
</Spelling>
<Context>
<title>Sample Vacation CSE</title>
<Facet>
<FacetItem>
<label>restaurants</label>
<anchor_text>restaurants</anchor_text>
</FacetItem>
<FacetItem>
<label>wineries</label>
<anchor_text>wineries</anchor_text>
</FacetItem>
</Facet>
<Facet>
<FacetItem>
<label>golf_courses</label>
<anchor_text>golf courses</anchor_text>
</FacetItem>
</Facet>
<Facet>
<FacetItem>
<label>hotels</label>
<anchor_text>hotels</anchor_text>
</FacetItem>
</Facet>
<Facet>
<FacetItem>
<label>nightlife</label>
<anchor_text>nightlife</anchor_text>
</FacetItem>
</Facet>
<Facet>
<FacetItem>
<label>soccer_sites</label>
<anchor_text>soccer sites</anchor_text>
</FacetItem>
</Facet>
</Context>
<RES SN="1" EN="10">
<M>6080</M>
/*
* The FI tag after the comment indicates that the result
* set has been filtered. If the number of results were exact, the
* FI tag would be replaced by an XT tag in the same format.
*/
<FI />
<NB>
/*
* Since the request is for the first page of results, the PU tag,
* which contains a link to the previous page of search results,
* is not included in this XML result. If the sample result did include
* a previous page of results, it would be listed here, in the same format
* as the NU tag on the following line
*/
<NU>/search?q=socer&hl=en&lr=&ie=UTF-8&output=xml&client=test&start=10&sa=N</NU>
</NB>
<R N="1">
<U>http://www.soccerconnection.net/</U>
<UE>http://www.soccerconnection.net/</UE>
<T>SoccerConnection.net</T>
<CRAWLDATE>May 21, 2007</CRAWLDATE>
<S><b>soccer</b>; players; coaches; ball; world cup;<b>...</b></S>
<Label>transcodable_pages</Label>
<Label>accessible</Label>
<Label>soccer_sites</Label>
<LANG>en</LANG>
<HAS>
<DI>
<DT>SoccerConnection.net</DT>
<DS>Post your <b>soccer</b> resume directly on the Internet.</DS>
</DI>
<L/>
<C SZ="8k" CID="kWAPoYw1xIUJ"/>
<RT/>
</HAS>
</R>
/*
* The result includes nine more results, each enclosed by an R tag.
*/
</RES>
</GSP>
Regular/Advanced Search: XML Tags
XML responses for regular search requests and advanced search
requests both use the same set of XML tags. These XML tags are shown in
the XML example above and explained in the tables below.
The XML tags below are listed alphabetically by tag name, and each
tag definition contains a description of the tag, an example showing
how the tag would appear in an XML result and the format of the tag's
content. If the tag is a subtag of another XML tag or if the tag has
subtags or attributes of its own, that information is also provided in
the tag's definition table.
Certain symbols may be displayed next to some subtags in the
definitions below. These symbols, and their meanings, are:
? = optional subtag
* = zero or more instances of the subtag
+ = one or more instances of the subtag
| Definition |
The <anchor_text> tag specifies the text that
you should display to users to identify a refinement
label associated with a search result set. Since refinement labels
replace nonalphanumeric characters with underscores, you should not
display the value of the <label>
tag in your user interface. Instead, you should display the value of
the <anchor_text> tag.
|
| Example |
<anchor_text>golf
courses</anchor_text> |
| Subtag of |
FacetItem |
| Content Format |
Text |
| Definition |
This tag encapsulates the contents of a block in a body line of a Subscribed Link result. Each block has subtags T, U, and L. A nonempty T tag denotes that the block contains text; nonempty U and L tags denote that the block contains a link (with URL given in the U subtag and anchor text in the L subtag).
|
| Subtags |
T, U, L |
| Subtag of |
BODY_LINE |
| Content Format |
Empty |
| Definition |
This tag encapsulates the contents of a line in the body of a Subscribed Link result. Each body line consists of several BLOCK tags, which either contain some text or a link with URL and anchor text.
|
| Subtags |
BLOCK* |
| Subtag of |
SL_MAIN |
| Content Format |
Empty |
| Definition |
The <C> tag indicates that the WebSearch service
can retrieve a cached version of this search result URL. You cannot
retrieve cached pages through the XML API, but you can redirect users
to www.google.com for this
content.
|
| Attributes |
| Name |
Format |
Description |
| SZ |
Text (Integer + "k") |
Provides the size of the cached version
of the search result in kilobytes ("k"). |
| CID |
Text |
Identifies a document in Google's cache.
To fetch the document from the cache, send a search term built as
follows:
cache:CIDtext:escapedURL
The escaped URL is available in the UE tag.
|
|
| Example |
<C SZ="6k" CID="kvOXK_cYSSgJ" /> |
| Subtag of |
HAS |
| Content Format |
Empty |
| Definition |
The <C2C> tag indicates that
the result refers to a Traditional Chinese language page. This tag
appears only when Simplified and Traditional
Chinese Search is enabled. See the c2coff
query parameter definition for more information about enabling and
disabling this feature. |
| Content Format |
Text |
| Definition |
The <Context> tag encapsulates a list of
refinement labels associated with a set of search results.
|
| Example |
<Context> |
| Subtags |
title, Facet+ |
| Content Format |
Container |
| Definition |
The <CRAWLDATE> tag identifies the date that the
page was last crawled.
|
| Example |
<CRAWLDATE>May 21,
2005</CRAWLDATE> |
| Subtag of |
R |
| Content Format |
Text |
| Definition |
The <DI> tag encapsulates Open Directory Project
(ODP) category information for a single search result.
|
| Example |
<DI> |
| Subtags |
DT?, DS? |
| Subtag of |
HAS |
| Content Format |
Empty |
| Definition |
The <DS> tag provides the summary listed for a
single category in the ODP directory.
|
| Example |
<DS>Post your
<b>soccer</b> resume directly on the
Internet.</DS> |
| Subtag of |
DI |
| Content Format |
Text (may contain HTML) |
| Definition |
The <DT> tag provides the title for a single
category listed in the ODP directory.
|
| Example |
<DT>SoccerConnection.net</DT> |
| Subtag of |
DI |
| Content Format |
Text (may contain HTML) |
| Definition |
The <Facet> tag contains a logical grouping of <FacetItem> tags. You can
create
these groupings using the Custom Search
Engine XML Specification format. If you do not create these
groupings, the results_xml_tag_Context><Context> tag will
contain up
to four <Facet> tags. The items within each <Facet>
tag will be grouped for display purposes but may not have a logical
relationship.
|
| Example |
<Facet> |
| Subtags |
FacetItem+, title+ |
| Subtag of |
Context |
| Content Format |
Container |
| Definition |
The <FacetItem> tag encapsulates information
about a refinement label associated with a set of search results.
|
| Example |
<FacetItem> |
| Subtags |
label, anchor_text+ |
| Subtag of |
Facet |
| Content Format |
FacetItem |
| Definition |
The <FI> tag serves as a flag
that indicates whether document filtering was performed for the search.
See the Automatic Filtering section
of this document for more information about Google's search results
filters. |
| Example |
<FI /> |
| Subtag of |
RES |
| Content Format |
Empty |
| Definition |
The <GSP> tag
encapsulates all data returned in Google XML search results. "GSP" is
an abbreviation for "Google Search Protocol".
|
| Attributes |
| Name |
Format |
Description |
| VER |
Text (Integer) |
The VER
attribute specifies the version of the search results output. The
current output version is "3.2". |
|
| Example |
<GSP VER="3.2"> |
| Subtags |
PARAM+, Q,
RES?, TM |
| Content Format |
Empty |
| Definition |
The <HAS> tag encapsulates
information about any special search
request parameters supported for a particular URL.
Note: The definition of <HAS> for
WebSearch is more restrictive than in the DTD.
|
| Subtags |
DI?, L?,
C?, RT? |
| Subtag of |
R |
| Definition |
The <HN> tag indicates that host crowding has
occurred and that additional search results are available from the host
where a particular search result was found. The value is the host name
that should be used to retrieve the additional results.
|
| Attributes |
| Name |
Format |
Description |
| U |
Text |
HTML version of a web host name |
|
| Subtag of |
R |
| Content Format |
Text (URL-escaped web host) |
| Definition |
Google returns the <ISURL> tag
if the associated search query is a URL. |
| Subtag of |
GSP |
| Content Format |
Empty |
| Definition |
The presence of the <L> tag
indicates that the WebSearch service can find other sites that link to
this search result URL. To find such sites, you would use the link: special query term. |
| Subtag of |
HAS |
| Content Format |
Empty |
| Definition |
The <label> tag specifies a refinement label
that you can use to filter the search results that you receive. To use
a refinement label, add the string more:[[label tag value]] to
the value of the q parameter in your HTTP request to Google as
shown in the following example. Please note that this value must be
URL-escaped before you send the query to Google.
This example uses the refinement label golf_courses to filter search results about Palm Springs: q=Palm+Springs+more:golf_courses
The URL-escaped version of this query is: q=Palm+Springs+more%3Agolf_courses
Note: The <label> tag is not the same as
the <Label> tag, which identifies a refinement label
associated with a particular URL in your search results.
|
| Example |
<label>golf_courses</label> |
| Subtag of |
FacetItem |
| Content Format |
Text |
| Definition |
The <LANG> tag contains Google's best guess of
the language of the search result.
|
| Example |
<LANG>en</LANG> |
| Subtag of |
R |
| Content Format |
Text |
| Definition |
The <M> tag identifies the estimated total
number of results for the search.
Note: This estimate may not be accurate. Use it
at your own risk.
|
| Example |
<M>16200000</M> |
| Subtag of |
RES |
| Content Format |
Text |
| Definition |
The <NB> tag encapsulates navigation information
— links to the next page of search results or the previous page of
search results — for the result set.
Note: This tag is only present
if more results are available.
|
| Example |
<NB> |
| Subtags |
NU?, PU? |
| Subtag of |
RES |
| Content Format |
Empty |
| Definition |
The <NU> tag contains a relative link to the
next page of search results.
|
| Example |
<NU>/search?q=flowers&num=10&hl=en&ie=UTF-8
&output=xml&client=test&start=10</NU>
|
| Subtag of |
NB |
| Content Format |
Text (Relative URL) |
| Definition |
The <PARAM> tag
identifies an input parameter submitted in the HTTP request associated
with the XML result. Information about the parameter is contained in
the tag attributes — name, value, original_value — and there will be
one PARAM tag for each parameter submitted in the HTTP request.
|
| Attributes |
| Name |
Format |
Description |
| name |
Text |
Input parameter name. |
| value |
HTML |
HTML-formatted version of the input
parameter value. |
| original_value |
Text |
Original URL-escaped
version of the input parameter value. |
|
| Example |
<PARAM name="cr" value="countryNZ"
original_value="countryNZ" /> |
| Subtag of |
GSP |
| Content Format |
Complex |
| Definition |
The <PU> tag provides a relative link to the
previous page of search results.
|
| Example |
<PU>/search?q=flowers&num=10&hl=en&output=xml
&client=test&start=10</PU> |
| Subtag of |
NB |
| Content Format |
Text (Relative URL) |
| Definition |
The <Q> tag identifies the search query
submitted in the HTTP request associated with the XML result.
|
| Example |
<Q>pizza</Q>
|
| Subtag of |
GSP |
| Content Format |
Text |
| Definition |
The <R> tag encapsulates the details of an
individual search result.
Note: The definition of the <R> tag for
WebSearch is more restrictive than in the DTD.
|
| Attributes |
| Name |
Format |
Description |
| N |
Text (Integer) |
Indicates the index (1-based) of this
search result. |
| MIME |
Text |
Indicates the MIME type of the search
result. |
|
| Subtags |
U, UE,
T?, CRAWLDATE, S?, LANG?,
HAS, HN? |
| Subtag of |
RES |
| Definition |
The <RES> tag encapsulates the set of individual
search results and details about those results.
|
| Attributes |
| Name |
Format |
Description |
| SN |
Text (Integer) |
Indicates the index (1-based) of the
first search result returned in this result set. |
| EN |
Text (Integer) |
Indicates the index (1-based) of the last
search result returned in this result set. |
|
| Example |
<RES SN="1" EN="10"> |
| Subtags |
M, FI?,
XT?, NB?, R* |
| Subtag of |
GSP |
| Content Format |
Empty |
| Definition |
The presence of the <RT> tag
indicates that the WebSearch service can find a set of web pages that
are similar to this search result URL. To find this set of Web pages,
you would use the related: special query
term. |
| Subtag of |
HAS |
| Content Format |
Empty |
| Definition |
The <S> tag contains an excerpt for a search
result that shows query terms highlighted in bold. Line breaks are
included in the excerpt for proper text wrapping.
|
| Example |
<S>Washington (CNN) -- A bid to end the
Senate standoff over President
<b>Bush's</b> judicial picks would
let five nominees advance to a final vote while preserving the
<b>...<b>...</b><S> |
| Subtag of |
R |
| Content Format |
Text (HTML) |
| Definition |
This tag encapsulates the contents of a Subscribed Link result. The anchor text and URL of the title link are contained in T and U subtags respectively. The lines of body text and links are contained in BODY_LINE subtags.
|
| Subtags |
BODY_LINE*, T, U |
| Subtag of |
SL_RESULTS |
| Content Format |
Empty |
| Definition |
Subscribed Link results container tag. One of these will appear whenever you have a Subscribed Link in your search results. The SL_MAIN subtag contains the main result data.
|
| Subtags |
SL_MAIN* |
| Subtag of |
GSP |
| Content Format |
Empty |
| Definition |
The <Spelling> tag encapsulates an alternate
spelling suggestion for the submitted query. This tag only appears on
the first page of search results. Spelling suggestions are available in
English, Chinese, Japanese and Korean.
Note: Google will only return spelling suggestions for
queries where the gl parameter value is in
lowercase letters.
|
| Example |
<Spelling> |
| Subtags |
Suggestion |
| Subtag of |
GSP |
| Content Format |
Empty |
| Definition |
The <Suggestion> contains an
alternate spelling suggestion for the submitted query. You can use the
tag's content to suggest the alternate spelling to your search user.
The value of the q attribute is the
URL-escaped spelling suggestion that you can use as a query term. |
| Attributes |
| Name |
Format |
Description |
| q |
Text |
The q
attribute specifies the URL-escaped version
of the spelling suggestion. |
|
| Example |
<Suggestion
q="soccer"><b><i>soccer</i></b></Suggestion>
|
| Subtag of |
Spelling |
| Content Format |
Text (HTML) |
| Definition |
The <T> tag contains the title
of the result. |
| Example |
<T>Amici's East Coast
Pizzeria</T> |
| Subtag of |
R |
| Content Format |
Text (HTML) |
| Definition |
As a child of <Context>, the <title>
tag contains the name of your Custom Search Engine.
As a child of <Facet>, the <title>
tag provides a title for a set of facets.
|
| Example |
As a child of <Context>: <title>My Search
Engine</title>
As a child of <Facet>: <title>facet
title</title>
|
| Subtag of |
Context, Facet |
| Content Format |
Text |
| Definition |
The <TM> tag identifies the total server time
needed to return search results, measured in seconds.
|
| Example |
<TM>0.100445</TM> |
| Subtag of |
GSP |
| Content Format |
Text (Floating-point number) |
| Definition |
The <TT> tag provides a search
tip. |
| Example |
<TT><i>Tip: For most
browsers, pressing the Return key produces the same results as clicking
the Search button.</i></TT> |
| Subtag of |
GSP |
| Definition |
The <U> tag provides the URL
of the search result. |
| Example |
<U>http://www.dominos.com/</U> |
| Subtag of |
R |
| Content Format |
Text (Absolute URL) |
| Definition |
The <UD> tag provides the IDN-encoded
(International Domain Name) URL for the search result. The value allows
domains to be displayed using local languages. For example, the
IDN-encoded URL http://www.%E8%8A%B1%E4%BA%95.com
could be decoded and displayed as < |