Introduction to Baidu Search Scraper

This document provides information about the API endpoints, supported environments, and available GET parameters for the NetNut's Baidu Scraper API. The API allows you to retrieve search engine results pages (SERPs) from Baidu.

Authentication

To access the NetNut API, you must provide basic authentication credentials using the username and password provided by NetNut. Include these credentials in the HTTP request headers as follows:

Authorization: Basic base64(username:password)

Where base64(username:password) is the Base64-encoded string of your username and password concatenated with a colon (:) separator.

Environments

Base URL: https://serp-api.netnut.io Endpoint: /search/get-html?

The following parameters are supported when retrieving Baidu SERPs via the API. To target Baidu specifically, you must include the parameter siteType=baidu in your request.

API Parameters & Specifications

Search Query

Name
Status
Description

wd

Required

The 'wd' parameter defines the query you want to search for.

Pagination

Name
Status
Description

rn

Optional

Number of results return (max 50). Default: 10

pn

Optional

Pagination offset. 0 = first page, 10 = second, 20 = third, etc.

Localization

Name
Status
Description

ct

Optional

Language filter: 0 = all, 1 = simplified Chinese, 2 = traditional Chinese.

Advanced Filters

Name
Status
Description

rawHtml

Optional

The 'rawHtml' parameter defines the final output you want. It can be set to 1 (true) or 0 (false) to get the raw html response. for receiving html only (without parsing) set it to 2 (only)

device

Optional

The 'device' parameter defines the device to use to get the Google search results. The parameter can be set to desktop (default) to use a regular browser, or mobile to use a mobile browser (currently using iPhones).

f

Optional

Search type. 8 = normal search, 3 = suggestion list, 1 = related search.

q5

Optional

Keyword location filter.

1 = title only,

2 = URL only

q6

Optional

Restricts results to a specific domain.

gpc

Optional

Date range filter (Unix timestamp).

Example: gpc=stf=START_TIMESTAMP,END_TIMESTAMP|stftype=1d

Encoding Link

bs

Optional

Previous query, used when navigating from a related search.

oq

Optional

Original query, used when navigating from related search results.

API Results HTTP Response

The API supports JSON responses. Here is an example schema of the Baidu SERP API response:

{
  "url": "string",
  "engine": "baidu",
  "general": {
    "id": "b5f237ef-4cdf-4697-b363-12d3aff38681",
    "timestamp": "2025-08-20 12:48:24 UTC",
    "baidu_url": "https://www.baidu.com/s?wd=消息&ct=0",
    "resultscount": 760
  },
  "input": {
    "wd": "消息",
    "device": "desktop",
    "engine": "baidu"
  },
  "organic_results": [
    {
      "position": 1,
      "title": "消息用英语说,有哪些是可数的,哪些是不可数的 - 百度文库",
      "snippet": "答案:news 新闻,消息(不可数)information 信息(不可数)message 口信(可数名词)解析:本题考查英语中表示\"消息\"的常见名词的可数性区分。解析如下:1.news(新闻/消息):2.恒为不可数名词,形式上以-s结尾但实际不可数。使用时需遵循不可数名词规则:不能直接加a/an(错误:a news)表示单数时用\"a piece... ",
      "link": "https://wenku.baidu.com/view/d295380bfbc75fbfc77da26925c52cc58ad6903e.html",
      "displayed_link": "wenku.baidu.com",
      "tracking_link": "http://www.baidu.com/link?url=84TKhZfDTviZgKj8IGFdKYPkYtKrKIQUjwHMBG-8o_G-J2OGh_AGTFovBS5iv2nRyzPkYwACoBmpo8JXDuohcXjqkryabwTdTXczJOlsoXy-5BnYxMVqh5fzv_P5kq4D",
      "thumbnail": ""
    }
  ],
  "pagination": {
    "current": 1,
    "next": "https://www.baidu.com/s?wd=%E6%B6%88%E6%81%AF&base_query=消息&pn=40&oq=%E6%B6%88%E6%81%AF&rn=40&ie=utf-8&usm=3&rsv_pq=aff3064500162767&rsv_t=ac4cd08G4O%2BPonyq6yCdH%2FkI2aTch4g1JVOc6RL%2FUspqANO7kQKqgiymeiw&topic_pn=&rsv_page=1",
    "other_pages": {
      "2": "string",
      "3": "string",
      "4": "string"
    }
  },
  "html": "string"
}

Key Notes:

  • The top-level engine field will always be baidu when siteType=baidu is specified.

  • organic_results contains an array of result objects, each including position, title, snippet, URL, display link, and optional thumbnails.

  • The pagination object includes current and next page indices and URLs.

  • If rawHtml=1 is set, the full HTML source of the search result page will be included under the html field.

Last updated