Microsoft Azure Cognitive Search Client Library for Python
Azure Cognitive Search is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
The Azure Cognitive Search service is well suited for the following application scenarios:
Use the Azure.Search.Documents client library to:
Source code | Package (PyPI) | Package (Conda) | API reference documentation | Product documentation | Samples
Install the Azure Cognitive Search client library for Python with pip:
pip install azure-search-documents
To create a new search service, you can use the Azure portal, Azure PowerShell, or the Azure CLI.
az search service create --name <mysearch> --resource-group <mysearch-rg> --sku free --location westus
See choosing a pricing tier for more information about available options.
To interact with the Search service, you'll need to create an instance of the appropriate client class: SearchClient
for searching indexed documents, SearchIndexClient
for managing indexes, or SearchIndexerClient
for crawling data sources and loading search documents into an index. To instantiate a client object, you'll need an endpoint and an API key. You can refer to the documentation for more information on supported authenticating approaches with the Search service.
You can get the endpoint and an API key from the Search service in the Azure Portal. Please refer the documentation for instructions on how to get an API key.
Alternatively, you can use the following Azure CLI command to retrieve the API key from the Search service:
az search admin-key show --service-name <mysearch> --resource-group <mysearch-rg>
There are two types of keys used to access your search service: admin (read-write) and query (read-only) keys. Restricting access and operations in client apps is essential to safeguarding the search assets on your service. Always use a query key rather than an admin key for any query originating from a client app.
Note: The example Azure CLI snippet above retrieves an admin key so it's easier to get started exploring APIs, but it should be managed carefully.
To instantiate the SearchClient
, you'll need the endpoint, API key and index name:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
service_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
index_name = os.environ["AZURE_SEARCH_INDEX_NAME"]
key = os.environ["AZURE_SEARCH_API_KEY"]
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
You can also create a SearchClient
, SearchIndexClient
, or SearchIndexerClient
using Azure Active Directory (AAD) authentication. Your user or service principal must be assigned the "Search Index Data Reader" role.
Using the DefaultAzureCredential you can authenticate a service using Managed Identity or a service principal, authenticate as a developer working on an application, and more all without changing code. Please refer the documentation for instructions on how to connect to Azure Cognitive Search using Azure role-based access control (Azure RBAC).
Before you can use the DefaultAzureCredential
, or any credential type from Azure.Identity, you'll first need to install the Azure.Identity package.
To use DefaultAzureCredential
with a client ID and secret, you'll need to set the AZURE_TENANT_ID
, AZURE_CLIENT_ID
, and AZURE_CLIENT_SECRET
environment variables; alternatively, you can pass those values
to the ClientSecretCredential
also in Azure.Identity.
Make sure you use the right namespace for DefaultAzureCredential
at the top of your source file:
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
index_name = os.getenv("AZURE_SEARCH_INDEX_NAME")
credential = DefaultAzureCredential()
search_client = SearchClient(service_endpoint, index_name, credential)
An Azure Cognitive Search service contains one or more indexes that provide persistent storage of searchable data in the form of JSON documents. (If you're brand new to search, you can make a very rough analogy between indexes and database tables.) The Azure.Search.Documents client library exposes operations on these resources through two main client types.
SearchClient
helps with:
SearchIndexClient
allows you to:
SearchServiceClient
functionality is not yet available in our current previewSearchIndexerClient
allows you to:
Azure Cognitive Search provides two powerful features: Semantic Search and Vector Search.
Semantic Search enhances the quality of search results for text-based queries. By enabling Semantic Search on your search service, you can improve the relevance of search results in two ways:
To learn more about Semantic Search, you can refer to the documentation.
Vector Search is an information retrieval technique that overcomes the limitations of traditional keyword-based search. Instead of relying solely on lexical analysis and matching individual query terms, Vector Search utilizes machine learning models to capture the contextual meaning of words and phrases. It represents documents and queries as vectors in a high-dimensional space called an embedding. By understanding the intent behind the query, Vector Search can deliver more relevant results that align with the user's requirements, even if the exact terms are not present in the document. Moreover, Vector Search can be applied to various types of content, including images and videos, not just text.
To learn how to index vector fields and perform vector search, you can refer to the sample. This sample provides detailed guidance on indexing vector fields and demonstrates how to perform vector search.
Additionally, for more comprehensive information about Vector Search, including its concepts and usage, you can refer to the documentation. The documentation provides in-depth explanations and guidance on leveraging the power of Vector Search in Azure Cognitive Search.
The Azure.Search.Documents
client library (v1) is a brand new offering for
Python developers who want to use search technology in their applications. There
is an older, fully featured Microsoft.Azure.Search
client library (v10) with
many similar looking APIs, so please be careful to avoid confusion when
exploring online resources.
The following examples all use a simple Hotel data set that you can import into your own index from the Azure portal. These are just a few of the basics - please check out our Samples for much more.
Let's start by importing our namespaces.
import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
We'll then create a SearchClient
to access our hotels search index.
index_name = "hotels"
# Get the service endpoint and API key from the environment
endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]
# Create a client
credential = AzureKeyCredential(key)
client = SearchClient(endpoint=endpoint,
index_name=index_name,
credential=credential)
Let's search for a "luxury" hotel.
results = client.search(search_text="luxury")
for result in results:
print("{}: {})".format(result["hotelId"], result["hotelName"]))
You can use the SearchIndexClient
to create a search index. Fields can be
defined using convenient SimpleField
, SearchableField
, or ComplexField
models. Indexes can also define suggesters, lexical analyzers, and more.
client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
name = "hotels"
fields = [
SimpleField(name="hotelId", type=SearchFieldDataType.String, key=True),
SimpleField(name="baseRate", type=SearchFieldDataType.Double),
SearchableField(name="description", type=SearchFieldDataType.String, collection=True),
ComplexField(
name="address",
fields=[
SimpleField(name="streetAddress", type=SearchFieldDataType.String),
SimpleField(name="city", type=SearchFieldDataType.String),
],
collection=True,
),
]
cors_options = CorsOptions(allowed_origins=["*"], max_age_in_seconds=60)
scoring_profiles: List[ScoringProfile] = []
index = SearchIndex(name=name, fields=fields, scoring_profiles=scoring_profiles, cors_options=cors_options)
result = client.create_index(index)
You can Upload
, Merge
, MergeOrUpload
, and Delete
multiple documents from
an index in a single batched request. There are
a few special rules for merging
to be aware of.
DOCUMENT = {
"category": "Hotel",
"hotelId": "1000",
"rating": 4.0,
"rooms": [],
"hotelName": "Azure Inn",
}
result = search_client.upload_documents(documents=[DOCUMENT])
print("Upload of new document succeeded: {}".format(result[0].succeeded))
To authenticate in a National Cloud, you will need to make the following additions to your client configuration:
AuthorityHost
in the credential options or via the AZURE_AUTHORITY_HOST
environment variableaudience
in SearchClient
, SearchIndexClient
, or SearchIndexerClient
# Create a SearchClient that will authenticate through AAD in the China national cloud.
import os
from azure.identity import DefaultAzureCredential, AzureAuthorityHosts
from azure.search.documents import SearchClient
index_name = "hotels"
endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]
credential = DefaultAzureCredential(authority=AzureAuthorityHosts.AZURE_CHINA)
search_client = SearchClient(endpoint, index_name, credential=credential, audience="https://search.azure.cn")
In addition to querying for documents using keywords and optional filters, you can retrieve a specific document from your index if you already know the key. You could get the key from a query, for example, and want to show more information about it or navigate your customer to that document.
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
result = search_client.get_document(key="23")
print("Details for hotel '23' are:")
print(" Name: {}".format(result["hotelName"]))
print(" Rating: {}".format(result["rating"]))
print(" Category: {}".format(result["category"]))
This library includes a complete async API. To use it, you must first install an async transport, such as aiohttp. See azure-core documentation for more information.
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.aio import SearchClient
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))
async with search_client:
results = await search_client.search(search_text="spa")
print("Hotels containing 'spa' in the name (or other fields):")
async for result in results:
print(" Name: {} (rating {})".format(result["hotelName"], result["rating"]))
The Azure Cognitive Search client will raise exceptions defined in Azure Core.
This library uses the standard logging library for logging. Basic information about HTTP sessions (URLs, headers, etc.) is logged at INFO level.
Detailed DEBUG level logging, including request/response bodies and unredacted
headers, can be enabled on a client with the logging_enable
keyword argument:
import sys
import logging
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
# Create a logger for the 'azure' SDK
logger = logging.getLogger('azure')
logger.setLevel(logging.DEBUG)
# Configure a console output
handler = logging.StreamHandler(stream=sys.stdout)
logger.addHandler(handler)
# This client will log detailed information about its HTTP sessions, at DEBUG level
client = SearchClient("<service endpoint>", "<index_name>", AzureKeyCredential("<api key>"), logging_enable=True)
Similarly, logging_enable
can enable detailed logging for a single operation,
even when it isn't enabled for the client:
result = client.search(search_text="spa", logging_enable=True)
See our Search CONTRIBUTING.md for details on building, testing, and contributing to this library.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit cla.microsoft.com.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.