A few colleagues and I have been quietly working away at a little hackathon project to try and help drive consumers towards Irish retailers in the run into Christmas. The premise was simple. Could we use our knowledge of product data acquisition and web crawling to create a meta-search engine allowing a consumer to search for products and gift ideas (incl. price & availability) across lots of Irish retailers?
The result! https://buyirish.com. Follow us on Twitter for speedy updates about the website.
The first major challenge was acquiring the data. Typically we need to tailor our crawlers on a per-site basis. We would also need to consider:
Thankfully Alan Technical Lead at ChannelSight and our DAX (Data Acquisition) team came up with a clever solution using a combination of services to gather the data in a generic manner. They maintained a central "list of lists" of all the retailers and a .NetCore application handled extracting it from our crawler services and bulk loading it into the BuyIrish platform.
With the hackathons time constraints we didn't want the complexity of a SQL Instance modelling and pushing migrations with Entity Framework. We use Azure Cosmos DB extensively in our core platform and our production APIs serve millions of requests daily from Azure Cosmos DB. So it was a great candidate for direct storage and could deal with requests at scale. But we also knew that result relevancy was going to be important and the equivalent of a basic SQL SELECT statement wasn't going to cut it.
After a little research we had our answer. We could bulk load the data directly into Azure Cosmos DB and use Azure Cognitive Search to connect to the container. ACS would keep it's own index up to date based on an hourly check. This gave the advantage of result relevance scoring and the ability to tweak the scoring profiles if needed.
Getting the data into Azure Cosmos DB proved very easy with the new v3 Cosmos SDK. One of the first tests performed involved a bulk load upsert of 50000 items into the container taking only fractions of a second. It's also very easy to auto scale up & down the throughput provisioning on the fly via code.
With the data safely in Azure Cosmos DB our next step was to set up the Azure Cognitive Services (ACS) instance. Setting this up was a breeze. You can create a new instance directly from the Cosmos Resource and there's a walkthrough wizard to get the indexer setup using an hourly high-watermark check on the _ts timestamp.
One feature that ACS is sorely missing is the concept for Consistently Random Results. This would have been nice in order to give a fair distribution of views to similar products across multiple retailers or to build an Inspire Me function to return completely random results from a * search.
While I continued with Azure Cosmos DB & ACS my colleague Daniel was busy getting the website up and running. The front-end is built using a vanilla .NET 4.8 MVC5 project with a bootstrap theme and some custom css/js.
The use case is pretty simple. When the consumer arrives on the site they can search for a term and ACS will return the most relevant products in order based on it's internal scoring algorithm and a tweaked scoring profile we've provided.
The consumer can also press Inspire Me and a random keyword will be chosen to provide a selection of different products. And finally you can see a retailer listing
We did notice some discrepancies in the results. The problem was that some retailers provided extremely verbose product descriptions which might repeat a search term multiple times while another retailer with a more relevant product might only mention the term once in the product title.
For example if a user searched for ACME Phone which is the more relevant product
- ACME Phone X
- ACME Phone compatible Phone Cover. Description: Works with ACME Phone Model X ACME Phone Model Y ACME Phone Model Z ACME Phone Model Q
The solution was to provide a scoring profile which over-rode the weights for these results and now gives a much higher weighting to term occurrence in the product name than in the product description.
We wanted to get some very lightweight metrics on two things.
1. What are people searching for?
2. What are people clicking on?
To achieve this we created a very lightweight click handler that redirects to the retailer site. As the consumer is redirected we capture some anonymous statistics about the retailer and product they've clicked persisting the stats directly to App Insights. This is a nice quick solution as it allows us to write some quick kusto queries to see how things are performing. On the front end we also push some basic user analytics to GoogleAnalytics.
One thing which is still causing some head-aches is the ability to use Fuzzy Search. Azure Cognitive Services support the Lucene Query syntax. It should be possible to use keyword modifiers like ~ to specify fuzzy matching on certain words. This however led to spurious results. While beneficial for searches like tshirt~ to find results for t-shirts it caused much poorer results for mis-spellings or keywords that clearly weren't covered by any retailer. hurling~ led to hits for haflinger horse related products and attempting to supply numeric modifiers like hurling~1 tanked the results entirely.
Overall this was a fun little project. It's always nice to get out of the day-to-day JIRA Backlog and explore some new technology and I can definitely see us having a use for Azure Cognitive Search at some point in the future on our product roadmap. Thanks to my colleagues: Daniel P Alan Daniel G Bogdan Dorothy Enda and John that chipped in to help this get live.
ChannelSight has a team of experts that can help you to optimise your eCommerce strategy. Contact us todayto learn about us.