Scrape and search localized results from Google, Bing, Baidu, Yahoo, Yandex, Ebay, Homedepot, youtube at scale using SerpApi.com
This Python package is meant to scrape and parse search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay and more, using SerpApi.
The following services are provided:
SerpApi provides a script builder to get you started quickly.
Python 3.7+
pip install google-search-results
Link to the python package page
from serpapi import GoogleSearch
search = GoogleSearch({
"q": "coffee",
"location": "Austin,Texas",
"api_key": "<your secret api key>"
})
result = search.get_dict()
This example runs a search for "coffee" using your secret API key.
The SerpApi service (backend)
Et voilĂ ...
Alternatively, you can search:
See the playground to generate your code.
Source code.
params = {
"q": "coffee",
"location": "Location Requested",
"device": "desktop|mobile|tablet",
"hl": "Google UI Language",
"gl": "Google Country",
"safe": "Safe Search Flag",
"num": "Number of Results",
"start": "Pagination Offset",
"api_key": "Your SERP API Key",
# To be match
"tbm": "nws|isch|shop",
# To be search
"tbs": "custom to be search criteria",
# allow async request
"async": "true|false",
# output format
"output": "json|html"
}
# define the search search
search = GoogleSearch(params)
# override an existing parameter
search.params_dict["location"] = "Portland"
# search format return as raw html
html_results = search.get_html()
# parse results
# as python Dictionary
dict_results = search.get_dict()
# as JSON using json package
json_results = search.get_json()
# as dynamic Python object
object_result = search.get_object()
Link to the full documentation
See below for more hands-on examples.
You can get an API key here if you don't already have one: https://serpapi.com/users/sign_up
The SerpApi api_key
can be set globally:
GoogleSearch.SERP_API_KEY = "Your Private Key"
The SerpApi api_key
can be provided for each search:
query = GoogleSearch({"q": "coffee", "serp_api_key": "Your Private Key"})
We love true open source, continuous integration and Test Driven Development (TDD). We are using RSpec to test our infrastructure around the clock to achieve the best Quality of Service (QoS).
The directory test/ includes specification/examples.
Set your API key.
export API_KEY="your secret key"
Run test
make test
from serpapi import GoogleSearch
search = GoogleSearch({})
location_list = search.get_location("Austin", 3)
print(location_list)
This prints the first 3 locations matching Austin (Texas, Texas, Rochester).
[ { 'canonical_name': 'Austin,TX,Texas,United States',
'country_code': 'US',
'google_id': 200635,
'google_parent_id': 21176,
'gps': [-97.7430608, 30.267153],
'id': '585069bdee19ad271e9bc072',
'keys': ['austin', 'tx', 'texas', 'united', 'states'],
'name': 'Austin, TX',
'reach': 5560000,
'target_type': 'DMA Region'},
...]
The search results are stored in a temporary cache. The previous search can be retrieved from the cache for free.
from serpapi import GoogleSearch
search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas"})
search_result = search.get_dictionary()
assert search_result.get("error") == None
search_id = search_result.get("search_metadata").get("id")
print(search_id)
Now let's retrieve the previous search from the archive.
archived_search_result = GoogleSearch({}).get_search_archive(search_id, 'json')
print(archived_search_result.get("search_metadata").get("id"))
This prints the search result from the archive.
from serpapi import GoogleSearch
search = GoogleSearch({})
account = search.get_account()
This prints your account information.
from serpapi import BingSearch
search = BingSearch({"q": "Coffee", "location": "Austin,Texas"})
data = search.get_dict()
This code prints Bing search results for coffee as a Dictionary.
https://serpapi.com/bing-search-api
from serpapi import BaiduSearch
search = BaiduSearch({"q": "Coffee"})
data = search.get_dict()
This code prints Baidu search results for coffee as a Dictionary. https://serpapi.com/baidu-search-api
from serpapi import YandexSearch
search = YandexSearch({"text": "Coffee"})
data = search.get_dict()
This code prints Yandex search results for coffee as a Dictionary.
https://serpapi.com/yandex-search-api
from serpapi import YahooSearch
search = YahooSearch({"p": "Coffee"})
data = search.get_dict()
This code prints Yahoo search results for coffee as a Dictionary.
https://serpapi.com/yahoo-search-api
from serpapi import EbaySearch
search = EbaySearch({"_nkw": "Coffee"})
data = search.get_dict()
This code prints eBay search results for coffee as a Dictionary.
https://serpapi.com/ebay-search-api
from serpapi import HomeDepotSearch
search = HomeDepotSearch({"q": "chair"})
data = search.get_dict()
This code prints Home Depot search results for chair as Dictionary.
https://serpapi.com/home-depot-search-api
from serpapi import HomeDepotSearch
search = YoutubeSearch({"q": "chair"})
data = search.get_dict()
This code prints Youtube search results for chair as Dictionary.
https://serpapi.com/youtube-search-api
from serpapi import GoogleScholarSearch
search = GoogleScholarSearch({"q": "Coffee"})
data = search.get_dict()
This code prints Google Scholar search results.
from serpapi import WalmartSearch
search = WalmartSearch({"query": "chair"})
data = search.get_dict()
This code prints Walmart search results.
from serpapi import YoutubeSearch
search = YoutubeSearch({"search_query": "chair"})
data = search.get_dict()
This code prints Youtube search results.
from serpapi import AppleAppStoreSearch
search = AppleAppStoreSearch({"term": "Coffee"})
data = search.get_dict()
This code prints Apple App Store search results.
from serpapi import NaverSearch
search = NaverSearch({"query": "chair"})
data = search.get_dict()
This code prints Naver search results.
from serpapi import SerpApiClient
query = {"q": "Coffee", "location": "Austin,Texas", "engine": "google"}
search = SerpApiClient(query)
data = search.get_dict()
This class enables interaction with any search engine supported by SerpApi.com
from serpapi import GoogleSearch
search = GoogleSearch({"q": "coffe", "tbm": "isch"})
for image_result in search.get_dict()['images_results']:
link = image_result["original"]
try:
print("link: " + link)
# wget.download(link, '.')
except:
pass
This code prints all the image links, and downloads the images if you un-comment the line with wget (Linux/OS X tool to download files).
This tutorial covers more ground on this topic. https://github.com/serpapi/showcase-serpapi-tensorflow-keras-image-training
from serpapi import GoogleSearch
search = GoogleSearch({
"q": "coffe", # search search
"tbm": "nws", # news
"tbs": "qdr:d", # last 24h
"num": 10
})
for offset in [0,1,2]:
search.params_dict["start"] = offset * 10
data = search.get_dict()
for news_result in data['news_results']:
print(str(news_result['position'] + offset * 10) + " - " + news_result['title'])
This script prints the first 3 pages of the news headlines for the last 24 hours.
from serpapi import GoogleSearch
search = GoogleSearch({
"q": "coffe", # search search
"tbm": "shop", # news
"tbs": "p_ord:rv", # last 24h
"num": 100
})
data = search.get_dict()
for shopping_result in data['shopping_results']:
print(shopping_result['position']) + " - " + shopping_result['title'])
This script prints all the shopping results, ordered by review order.
With SerpApi, we can build a Google search from anywhere in the world. This code looks for the best coffee shop for the given cities.
from serpapi import GoogleSearch
for city in ["new york", "paris", "berlin"]:
location = GoogleSearch({}).get_location(city, 1)[0]["canonical_name"]
search = GoogleSearch({
"q": "best coffee shop", # search search
"location": location,
"num": 1,
"start": 0
})
data = search.get_dict()
top_result = data["organic_results"][0]["title"]
We offer two ways to boost your searches thanks to theasync
parameter.
# Operating system
import os
# regular expression library
import re
# safe queue (named Queue in python2)
from queue import Queue
# Time utility
import time
# SerpApi search
from serpapi import GoogleSearch
# store searches
search_queue = Queue()
# SerpApi search
search = GoogleSearch({
"location": "Austin,Texas",
"async": True,
"api_key": os.getenv("API_KEY")
})
# loop through a list of companies
for company in ['amd', 'nvidia', 'intel']:
print("execute async search: q = " + company)
search.params_dict["q"] = company
result = search.get_dict()
if "error" in result:
print("oops error: ", result["error"])
continue
print("add search to the queue where id: ", result['search_metadata'])
# add search to the search_queue
search_queue.put(result)
print("wait until all search statuses are cached or success")
# Create regular search
while not search_queue.empty():
result = search_queue.get()
search_id = result['search_metadata']['id']
# retrieve search from the archive - blocker
print(search_id + ": get search from archive")
search_archived = search.get_search_archive(search_id)
print(search_id + ": status = " +
search_archived['search_metadata']['status'])
# check status
if re.search('Cached|Success',
search_archived['search_metadata']['status']):
print(search_id + ": search done with q = " +
search_archived['search_parameters']['q'])
else:
# requeue search_queue
print(search_id + ": requeue search")
search_queue.put(result)
# wait 1s
time.sleep(1)
print('all searches completed')
This code shows how to run searches asynchronously. The search parameters must have {async: True}. This indicates that the client shouldn't wait for the search to be completed. The current thread that executes the search is now non-blocking, which allows it to execute thousands of searches in seconds. The SerpApi backend will do the processing work. The actual search result is deferred to a later call from the search archive using get_search_archive(search_id). In this example the non-blocking searches are persisted in a queue: search_queue. A loop through the search_queue allows it to fetch individual search results. This process can easily be multithreaded to allow a large number of concurrent search requests. To keep things simple, this example only explores search results one at a time (single threaded).
The search results can be automatically wrapped in dynamically generated Python object. This solution offers a more dynamic, fully Oriented Object Programming approach over the regular Dictionary / JSON data structure.
from serpapi import GoogleSearch
search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas"})
r = search.get_object()
assert type(r.organic_results), list
assert r.organic_results[0].title
assert r.search_metadata.id
assert r.search_metadata.google_url
assert r.search_parameters.q, "Coffee"
assert r.search_parameters.engine, "google"
Let's collect links across multiple search results pages.
# to get 2 pages
start = 0
end = 40
page_size = 10
# basic search parameters
parameter = {
"q": "coca cola",
"tbm": "nws",
"api_key": os.getenv("API_KEY"),
# optional pagination parameter
# the pagination method can take argument directly
"start": start,
"end": end,
"num": page_size
}
# as proof of concept
# urls collects
urls = []
# initialize a search
search = GoogleSearch(parameter)
# create a python generator using parameter
pages = search.pagination()
# or set custom parameter
pages = search.pagination(start, end, page_size)
# fetch one search result per iteration
# using a basic python for loop
# which invokes python iterator under the hood.
for page in pages:
print(f"Current page: {page['serpapi_pagination']['current']}")
for news_result in page["news_results"]:
print(f"Title: {news_result['title']}\nLink: {news_result['link']}\n")
urls.append(news_result['link'])
# check if the total number pages is as expected
# note: the exact number if variable depending on the search engine backend
if len(urls) == (end - start):
print("all search results count match!")
if len(urls) == len(set(urls)):
print("all search results are unique!")
Examples to fetch links with pagination: test file, online IDE
SerpApi keeps error management simple.
If it's a backend error, a simple error message is returned as string in the server response.
from serpapi import GoogleSearch
search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas", "api_key": "<secret_key>"})
data = search.get_json()
assert data["error"] == None
In some cases, there are more details available in the data object.
If it's a client error, then a SerpApiClientException is raised.
2023-03-10 @ 2.4.2
2021-12-22 @ 2.4.1
2021-07-26 @ 2.4.0
2021-06-05 @ 2.3.0
2021-04-28 @ 2.2.0
2021-04-04 @ 2.1.0
2020-10-26 @ 2.0.0
2020-06-30 @ 1.8.3
2020-03-25 @ 1.8
2019-11-10 @ 1.7.1
2019-09-12 @ 1.7
2019-06-25 @ 1.6
SerpApi supports all the major search engines. Google has the more advance support with all the major services available: Images, News, Shopping and more.. To enable a type of search, the field tbm (to be matched) must be set to:
The field tbs
allows to customize the search even more.