4x Faster Research by Leveraging Intelligent Search with Azure Machine Learning
Our client needed a way to consolidate all their research into one portal to make the process simpler and reduce duplicate efforts. They also desired a common interface where they could enter in a search query, or a topic and quickly get the information they were looking for.
quicker access to information
faster research process
errors due to duplicated research
Background and Challenge
Our client is an accredited membership-based association located in North America that develops and provides training on standards, ensuring quality and safety and reducing environmental impact.
The association was experiencing three main pain points around research and the generation of standards documentation that needed to be addressed.
Time and Effort of Finding Information
The client’s standards documents are stored in PDF format. To create new standards, the client was manually sifting through 100s of historical PDF documents to gather and piece together information relevant to these new standard topics. This was a time-consuming and inefficient process, and the client needed a quicker and simpler way to search for and consolidate information.
Manual Process for Creating New Standards
Each new standards document was created from scratch by manually reformatting and copying and pasting information in as a baseline for a new document, which was then further populated with new research. The client needed a way to speed up the process with automated prepopulated templates for new standards creation.
Accessing Internal and External Research
When the client creates new standards, they search for information from both their existing internal standards, as well as external sources such as published articles in journals and current events.
They needed a way to consolidate all their research into one portal to make the process simpler and reduce duplicate efforts. They also desired a common interface where they could enter in a search query, or a topic and quickly get the information they were looking for.
The client had already leveraged some out-of-the-box tools that weren’t performing adequately, so they engaged Adastra to build customized machine learning (ML) solutions.
Adastra is a leader and award winner in the fields of data and artificial intelligence. Our team of experts leverages AI and ML to help organizations draw meaningful insights from their data and drive business forward.
We have received investments from SCALE AI, Canada’s AI supercluster for two of our cutting-edge artificial intelligence projects in 2022 on supply chain optimization and a smart platform for optimizing agricultural yield.
Ingesting and Structuring PDF Data with SQL Database
Adastra began by building the solution on a sample of around 150 PDF documents. The first step was ingesting the unstructured PDF documents and putting them into a structured, tabular format.
Adastra leveraged a number of Python libraries to parse the PDF documents into respective clauses and added tags to each one to facilitate easier searching (e.g., clause 1.1, clause 1.2, clause 1.3). Each clause contains information such as what standard it came from, the latest revision date, the contributing authors, and more. For a single PDF document, this could mean hundreds and thousands of rows of data.
To store the clause-level information, Adastra created structured SQL Database input datasets, creating a single source of truth for all of the client’s standards documents.
Many of the PDFs did not follow consistent formatting, so for high-priority standards, Adastra developed additional custom scripts.
Building an Intelligent Search System on Azure
After completing the structured data input, Adastra worked on training an intelligent search model, deployed on Azure Machine Learning. State-of-the-art language models were leveraged to provide the most accurate search results. Adastra surfaced the intelligent search service through a chatbot using Azure’s chatbot framework.
The solution allows users to enter a query with information about what they are searching for, and the intelligent search model will use this as input and produce a ranked list of clauses that are most relevant to the user’s search query across all available standards.
Adastra also integrated the intelligent search system with Azure QnA Maker to program it to intelligently answer direct user questions. A list of commonly asked questions (e.g., the client’s address and phone number) was prepopulated in Azure QnA Maker so that when users enter a query, the chatbot will first check this knowledge base and provide users with an actual answer to their question, rather than just pointing them to the paragraph that contains the relevant information. If the system can’t find the appropriate answer in Azure QnA Maker, then it calls on the custom intelligent search backend to provide a response based on internal standards data.
Creating a Template Generator with Python
Next, Adastra built out a backend solution using Python that takes input from users at the frontend to prepopulate templates for creating new standards.
Users can select sections of PDFs that they want to keep for new standards and the backend system will prepopulate a new Word document based on their selected sections to create a starting point for the client. The Word document can then be downloaded through the interface.
Adastra also set up feedback logging for the client for future iterations of retraining the system.
- Ease of information retrieval
- Consolidated research from multiple disparate systems
- Streamlined process for creating new documents
- Reduced time and effort with automated processes
Adastra’s customized solution addressed all of the client’s pain points and saved them significant time and effort.
By structuring and ingesting their PDF documents and building an intelligent search system on Azure, the client is now able to search for and easily pull up the desired information from their historical standard PDF documents. They now have quick access to information they were previously looking for manually – saving them time and effort in retrieving information from large, previously unstructured data sources.
Adastra created a single common interface, consolidating research findings from multiple disparate data sources. The client can now quickly pull up research from both external and internal sources in one place and have frequently asked questions answered directly, saving them time and eliminating duplicate research efforts.
The Python solution, in combination with Azure accelerators, allows the client to streamline the creation of new documents through prepopulated templates, saving them time and manual effort.