Success Story

4x Faster Research by Leveraging Intelligent Search with Azure Machine Learning

Our client needed a way to consolidate all their research into one portal to make the process simpler and reduce duplicate efforts. They also desired a common interface where they could enter in a search query, or a topic and quickly get the information they were looking for.


quicker access to information


faster research process


errors due to duplicated research

About the Client

A global organization dedicated to safety, social good and sustainability. A leader in standards development, inspection and certification around the world including Canada, the U.S., Europe and Asia.


Adastra developed an intelligent search system based on Azure Machine Learning, which provides a single common interface, consolidating research findings from multiple disparate data sources.

Success Story


Background and Challenge

Our client is an accredited membership-based association located in North America that develops and provides training on standards, ensuring quality and safety and reducing environmental impact.

The association was experiencing three main pain points around research and the generation of standards documentation that needed to be addressed.

Time and Effort of Finding Information

The client’s standards documents are stored in PDF format. To create new standards, the client was manually sifting through 100s of historical PDF documents to gather and piece together information relevant to these new standard topics. This was a time-consuming and inefficient process, and the client needed a quicker and simpler way to search for and consolidate information.

Manual Process for Creating New Standards

Each new standards document was created from scratch by manually reformatting and copying and pasting information in as a baseline for a new document, which was then further populated with new research. The client needed a way to speed up the process with automated prepopulated templates for new standards creation.

Accessing Internal and External Research

When the client creates new standards, they search for information from both their existing internal standards, as well as external sources such as published articles in journals and current events.

They needed a way to consolidate all their research into one portal to make the process simpler and reduce duplicate efforts. They also desired a common interface where they could enter in a search query, or a topic and quickly get the information they were looking for.

Why Adastra?

The client had already leveraged some out-of-the-box tools that weren’t performing adequately, so they engaged Adastra to build customized machine learning (ML) solutions.

Adastra is a leader and award winner in the fields of data and artificial intelligence. Our team of experts leverages AI and ML to help organizations draw meaningful insights from their data and drive business forward.

We have received investments from SCALE AI, Canada’s AI supercluster for two of our cutting-edge artificial intelligence projects in 2022 on supply chain optimization and a smart platform for optimizing agricultural yield.

Adastra’s Solution

Ingesting and Structuring PDF Data with SQL Database

Adastra began by building the solution on a sample of around 150 PDF documents. The first step was ingesting the unstructured PDF documents and putting them into a structured, tabular format.

Adastra leveraged a number of Python libraries to parse the PDF documents into respective clauses and added tags to each one to facilitate easier searching (e.g., clause 1.1, clause 1.2, clause 1.3). Each clause contains information such as what standard it came from, the latest revision date, the contributing authors, and more. For a single PDF document, this could mean hundreds and thousands of rows of data.

To store the clause-level information, Adastra created structured SQL Database input datasets, creating a single source of truth for all of the client’s standards documents.

Many of the PDFs did not follow consistent formatting, so for high-priority standards, Adastra developed additional custom scripts.

Intelligent search solution - high-level architecture diagram.

Building an Intelligent Search System on Azure

After completing the structured data input, Adastra worked on training an intelligent search model, deployed on Azure Machine Learning. State-of-the-art language models were leveraged to provide the most accurate search results. Adastra surfaced the intelligent search service through a chatbot using Azure’s chatbot framework.

The solution allows users to enter a query with information about what they are searching for, and the intelligent search model will use this as input and produce a ranked list of clauses that are most relevant to the user’s search query across all available standards.

Adastra also integrated the intelligent search system with Azure QnA Maker to program it to intelligently answer direct user questions. A list of commonly asked questions (e.g., the client’s address and phone number) was prepopulated in Azure QnA Maker so that when users enter a query, the chatbot will first check this knowledge base and provide users with an actual answer to their question, rather than just pointing them to the paragraph that contains the relevant information. If the system can’t find the appropriate answer in Azure QnA Maker, then it calls on the custom intelligent search backend to provide a response based on internal standards data.

Creating a Template Generator with Python

Next, Adastra built out a backend solution using Python that takes input from users at the frontend to prepopulate templates for creating new standards.

Users can select sections of PDFs that they want to keep for new standards and the backend system will prepopulate a new Word document based on their selected sections to create a starting point for the client. The Word document can then be downloaded through the interface.

Adastra also set up feedback logging for the client for future iterations of retraining the system.


  • Ease of information retrieval
  • Consolidated research from multiple disparate systems
  • Streamlined process for creating new documents
  • Reduced time and effort with automated processes

Adastra’s customized solution addressed all of the client’s pain points and saved them significant time and effort.

By structuring and ingesting their PDF documents and building an intelligent search system on Azure, the client is now able to search for and easily pull up the desired information from their historical standard PDF documents. They now have quick access to information they were previously looking for manually – saving them time and effort in retrieving information from large, previously unstructured data sources.

Adastra created a single common interface, consolidating research findings from multiple disparate data sources. The client can now quickly pull up research from both external and internal sources in one place and have frequently asked questions answered directly, saving them time and eliminating duplicate research efforts.

The Python solution, in combination with Azure accelerators, allows the client to streamline the creation of new documents through prepopulated templates, saving them time and manual effort.

Need a similar solution?

. Contact Form (Sidebar)

Share this story

Read More Success Stories

Need a similar solution? Get in touch with us, we can help.