Setting up an @Azure Search JSON blob Indexer with api-version=2015-02-28-Preview
I would like to thank Microsoft’s Eugene Shvets for helping me out with setting up Azure Search for JSON blobs. What I am going to write here should be available visually in the Azure Portal soon after June 2016. I am going to share a few RESTful OData-flavored calls using an old shoe in the .NET closet called HttpWebRequest
. To further reveal how old I am, kids, I am going to use Visual Studio Test (to “confuse” you) in all of my code samples.
Once we use these REST calls to get search working, we can use the Azure Portal to run test searches. This is what it looks like:
There are three ‘components’ to get Azure search working:
- Data Source (of type
azureblob
) - Index (without a default field key of
id
) - Indexer (with configuration parameter
useJsonParser = true
)
As of today, it is not possible to use the Azure Portal to generate an azureblob
Data Source. It is also not possible to use the Portal to get an Indexer—and, while it is possible to get an Index in the Portal, it will have a default key of id
which I cannot change in the UI. So, it’s best to make REST calls—likely the *same *calls made from the Portal.
Learn to DELETE and GET a search ‘component’ before generating it…
I am not a Test-Driven Development type of guy but I do have opinions and I like to be as clean and neat as possible. All of these quirks drive me to mention the need to DELETE
the things I POST
to Azure for the need to undo any mistake I might make. So here is my “confusing” way to DELETE
:
[TestCategory("Integration")]
[TestMethod]
[TestProperty("apiBase", "https://my-azure.search.windows.net")]
[TestProperty("apiKey", "[copy and paste from Portal]")]
[TestProperty("apiTemplate", "{componentName}/{itemName}?api-version=2015-02-28-Preview")]
[TestProperty("componentName", "indexers")]
[TestProperty("itemName", "songhayblog-indexer")]
public void ShouldDeleteAzureSearchServiceComponent()
{
var projectRoot = this.TestContext.ShouldGetProjectsFolder(this.GetType());
#region test properties:
var apiBase = this.TestContext.Properties["apiBase"].ToString();
var apiKey = this.TestContext.Properties["apiKey"].ToString();
var apiTemplate = new UriTemplate(this.TestContext.Properties["apiTemplate"].ToString());
var componentName = this.TestContext.Properties["componentName"].ToString();
var itemName = this.TestContext.Properties["itemName"].ToString();
#endregion
var uri = apiTemplate.BindByPosition(new Uri(apiBase, UriKind.Absolute), componentName, itemName);
this.TestContext.WriteLine("uri: {0}", uri);
var request = ((HttpWebRequest)WebRequest.Create(uri));
request.Method = "DELETE";
request.Accept = MimeTypes.ApplicationJson;
request.ContentType = MimeTypes.ApplicationJson;
request.Headers.Add("api-key", apiKey);
var code = request.ToHttpStatusCode();
this.TestContext.WriteLine("HttpStatusCode: {0}", code);
Assert.IsTrue(code == HttpStatusCode.NoContent, "The expected status code is not here.");
}
For details on where apiKey comes from, see “Query your Azure Search index using the REST API” by Ashish Makadia. So without the .NET ceremony a DELETE
looks like this:
https://my-azure.search.windows.net/{componentName}/{itemName}?api-version=2015-02-28-Preview
…where componentName
represents our three ‘components’, datasources
, indexers
and indexes
, and itemName
is your name of the ‘component.’
When we change this line:
request.Method = "DELETE";
…to this:
request.Method = "GET";
Our DELETE
changes to a GET
—so the URI above can be used for GET
operations to verify that our POST
operations are working. I am sure, by the way, that PUT
is supported here but I did not want to bother Eugene about this (see “Azure Search Service REST”—this might be of help).
POST of a new Azure-Blob Data Source
We have already seen that DELETE
and GET
operations can be shared. It should be no surprise that all of our POST
operations are the same—the only thing that changes is the JSON “body.” In the screenshot below, I have highlighted the json
variable—being passed to my not-required-at-all, custom extension method WithRequestBody()
:
So, the important piece is not shown above is the JSON in the POST
:
{
"name": "songhayblog-datasource",
"type": "azureblob",
"credentials": { "connectionString": "[copy and paste from Portal]" },
"container": {
"name": "songhayblog-azurewebsites-net",
"query": "BlogEntry"
}
}
For details on where connectionString
comes from, see “Windows Azure—Configuring Storage Accounts” by Biju Paulose. The rest of these JSON properties are covered by Eugene in “Indexing Documents in Azure Blob Storage with Azure Search.”
The response from the Azure Search API looks like this:
POST of a new Azure-Blob Index
This is the JSON payload for generating a new Index:
{
"name": "songhayblog-index",
"fields": [
{
"name": "Slug",
"type": "Edm.String",
"key": true,
"searchable": false
},
{
"name": "Content",
"type": "Edm.String",
"searchable": true
},
{
"name": "Title",
"type": "Edm.String",
"searchable": true
}
]
}
The fields
of this Index refer to the JSON shape that represents the BlogEntry
object that defines the Blog entries for the Blog you are reading now:
{
"Author": "Bryan Wilhite",
"Content": "<p>I would like to thank <a href=\"https://twitter.com/chaosrealm4\">Microsoft’s Eugene Shvets</a> for helping me [XHTML truncated]",
"InceptDate": "2016-06-13T21:42:54.1078686-07:00",
"IsPublished": true,
"ItemCategory": null,
"ModificationDate": "0001-01-01T00:00:00",
"Slug": "setting-up-an-azure-search-json-blob-indexer-with-api-version-2015-02-28-preview",
"SortOrdinal": 0,
"Tag": null,
"Title": "Setting up an @Azure Search JSON blob Indexer with api-version=2015-02-28-Preview"
}
POST of a new Azure-Blob Indexer
The Indexer is what ‘fills’ the Index, starting the “crawl” of the Azure Blob Container. In the POST
JSON payload, we see it targeting the index named above, using a schedule interval I copied from Eugene:
{
"name": "songhayblog-indexer",
"dataSourceName": "songhayblog-datasource",
"parameters": { "configuration": { "useJsonParser": true } },
"targetIndexName": "songhayblog-index",
"schedule": { "interval": "PT2H" }
}
In case you care about this HttpWebRequest stuff…
My HttpWebRequest
stuff here is not “confusing” it is more likely to be considered “old” (compared to the async-only HttpClient
)—but experience informs me that this “old” stuff is backwards compatible. So I have made investments in a few extension methods around HttpWebRequest
:
Related Links
- “Azure Search Service REST”
- “Indexing Documents in Azure Blob Storage with Azure Search” by Eugene Shvets
- “Get started with Azure Search in the portal” by Heidi Steen
- “Query your Azure Search index using the REST API” by Ashish Makadia