Elevating Enterprise Search With AI, ML, NLP, and LLMS
Information retrieval, now more commonly known simply as search, is one of the earliest manifestations of using AI. Consequently, enterprise search is inseparable from AI and one of the most meaningful beneficiaries of the exponential advancements recently achieved by machine learning (ML)—the most accomplished expression of statistical AI.
Language models—including large language models (LLMs), smaller open source models, and closed source models—typify the newfound natural language processing (NLP) prowess exhibited by ML. Applying this NLP approach to enterprise search means more than elevating this discipline to real-time question-answering via natural language query and responses. It means enterprise search is as readily applicable to business intelligence, knowledge management, and analytics in general as it is to anything else related to enterprise data.
Some searches are constant across companies and industries. “I want a model that’s generic enough that I can do any search,” explained Abhishek Das, head of engineering at Acante. “I can do a search such as: ‘Tell me all the customers that recently signed up,’ or ‘Tell me all the customers who were previously served by a sales guy who has left.’ Those are easy retrieval queries.”
There are numerous considerations for applying language models to make enterprise search a natural language, question-answering experience. Organizations must select the proper architecture for implementing those models, correctly engineer prompts, evaluate fine-tuning and model training approaches, and choose a prompt augmentation strategy.
None of these points of concern, however, overshadow one simple, irrevocable truth that’s almost certain to last as long, if not longer, than the age of AI itself. Keyword search, also termed lexical search, will remain relevant, if not necessary, to the discipline of enterprise search. That may sound counterintuitive in today’s AI-first world. “In terms of enterprise search, we see hybrid search, which is a combination of vector search and simple lexical search,” posited Mikayel Harutyunyan, Activeloop CMO. “Vector search does not kill lexical search.”
MODEL FINE-TUNING AND TRAINING DATA
Vector search involves embedding data as numerical representations termed vectors (typically in a vector store) and finding similarities between those embeddings and embedded questions called prompts. For example, Harutyunyan described a use case in which vector search techniques enable a company specializing in designing 3D tours of real estate properties “to, say, find me kitchens similar to this one, or spaces similar to this.” Since vector search is central to most language model adaptations of enterprise search, it’s significant that numerous vector databases still support keyword-based search. Language model selection is one of the foremost considerations for utilizing vector search. Organizations can opt between open source models, such as the Llama iterations, smaller models found in open source frameworks such as Hugging Face, and closed source models, such as ChatGPT and Claude.
According to Das, open source models “are more suited for fine-tuning.” Fine-tuning a model makes it more domain-specific for a particular use case or for an organization’s own data. “Fine-tuning is only retraining the last layer of the neural network for the weights,” Das explained. Fine-tuning a language model allows organizations to get more relevant and accurate responses from enterprise search.
Prompt engineering strengthens this advantage, allowing organizations to limit the scope of the model’s response to the input prompt or question. Users must account for the cost of this additional training and the effort required to maintain the training datasets. “It’s not like you train it once and you’re done,” Das pointed out. “Your high-quality dataset needs to be maintained, and you need to ensure there is no manipulation of the data. You better have really good data scientists to be able to do this.”
PROMPT AUGMENTATION AND RAG
Before a user’s prompt is sent to a language model, there’s usually a form of prompt augmentation that “augments the prompt with different enterprise data,” mentioned Aaron Fulkerson, Opaque Systems CEO. The retrieval-augmented generation (RAG) architecture is a type of prompt augmentation that supplements the user’s prompt with information from what can be multiple sources—which typically include a vector database. The augmented information supplies the language model with additional context from which to retrieve the relevant information for natural language queries and generate a response. This approach helps constrict the model’s responses to enterprise data and counteracts models’ propensities to “hallucinate,” or fabricate, illusory responses. Vector databases are usually employed in RAG because “You can represent everything in a vector,” Das commented.
Unstructured, semi-structured, and structured data can all be vectorized, in addition to text, images, video, and other data types. The enterprise content in the vector database and users’ prompts, are embedded by a language model. “That model that determines the placement of those embeddings, determines what gets put in the prompt by the proximity in the database,” remarked Fulkerson. When the prompt and its augmented enterprise data are sent to the language model to answer the query, the model also applies facets of its own training to generate a response.
Moreover, “In every RAG process, there’ s a reasoning component, where the LLM is effectively making a judgment of ‘You’re giving me information; am I going to use this data to inform my answer?’” Harutyunyan said.
Companies and Suppliers Mentioned