Elevating Enterprise Search With AI, ML, NLP, and LLMS
RAG CONSIDERATIONS
RAG and vector databases aren’t a panacea for implementing enterprise search with language models. Organizations must account for several concerns, including these:
- Similarity Search: According to Harutyunyan, vector similarity search doesn’t always “solve what we call the retrieval problem. That something’s similar doesn’t mean it’s relevant and accurate.” Approximate nearest neighbor (ANN) is a popular vector database index and search algorithm. “The way it’s implemented, it’s an approximation,” Harutyunyan observed. “If you search the average of the query or the way things are structured, it doesn’t necessarily mean that’s what you actually need.”
- Embedding Techniques: How organizations embed the content in the database, and the prompts used to mine it via RAG, are other considerations. For example, multimodal embeddings, enabled by vector databases with support for tensors (which contain multiple vectors, data structures, types, and trajectories in a single location), are more appropriate for certain use cases. “How you embed the documents is not a completely solved problem,” Harutyunyan admitted.
- Performance, Accuracy, and Cost Tradeoffs: There’s also a delicate tradeoff between performance, accuracy, and cost that requires decision making. “If you assume that you can retrieve relevant documents to answer the questions, how many can you retrieve to be accurate enough to answer the question, but not be too costly?” Harutyunyan asked. “This is where a lot of companies struggle.”
- Chunking: The scope of the individual embeddings also influences how effectual vector search is. Organizations must decide the level of granularity of their embeddings, referred to as chunks, which could encompass entire documents, pages, clauses, or any other length of characters. “The mixed chunking strategy, when you combine the individual aspects of paragraphs, or sentences, or words, and a wider context, offers a better rounded response,” Harutyunyan advised. “But, the strategy should be adaptive to the task at hand.”
GRAPH RAG
Although RAG is the most pervasive form of prompt augmentation, there are multiple prompt augmentation techniques, some of which try to address RAG’s challenges. Graph RAG is a variation of RAG in which a semantic knowledge graph allows organizations to “just choose, based on anything in the knowledge graph, and say, ‘Hey, take these elements of my knowledge graph, index them, and then use that to answer this query,’” commented Franz CEO Jans Aasman. This approach constrains natural language queries to vetted enterprise knowledge for a particular domain. Another graph RAG advantage addresses the claim that although vector databases are adept at determining the similarity between a prompt and vectorized data, they’re less so at determining the context (i.e., similarities and relationships) of the vectorized data.
Many vector databases provide metadata-based pre-filtering and post-filtering to enhance search responses. However, that’s typically based on what Aasman called “a limited list of metadata elements.” With graph RAG, “you can literally use the entire graph for metadata filtering,” Aasman noted.
With this approach, users can ask more complicated questions, obtain what Aasman termed more specific answers, increase performance, and reduce cost (partly by asking fewer questions). Some semantic graph databases contain vector stores, allowing organizations to implement graph RAG.
CONFIDENTIAL AGENTS
One of the premier caveats about employing language models for enterprise search is control. “Now that you’re throwing all this data at an LLM, it’s hard to control what goes out and who should have visibility into what kind of data the model is outputting, because it’s not a model that you’ve fully trained, or you’re taking external models,” Das cautioned.
A prompt augmentation strategy described by Fulkerson as confidential agents directly addresses this concern. According to Fulkerson, with this paradigm, there are agents inside a confidential computing control plane, “So when somebody writes a prompt, the agents call to different data sources. It could be Workday, Microsoft Dynamics, or Snowflake. The agents take the credentials of the user who wrote the prompt, assemble the information, and it gets put on top of the prompt as context, then gets sent to the LLM.”
The key factor is the credentials Fulkerson mentioned, which are access control permissions specific to the user and adhere to enterprise mandates for data security, data governance, privacy, and regulatory compliance. For example, a salesman working for a company with more than 100 products can do a search to determine his commission, which entails complex math pertaining to different product prices, rates of commissions, accelerators, customer data, and more. The agents can securely compile the relevant information for the model to answer this query based on the salesman’ s permissions, since “commission payment is sensitive information,” Fulkerson indicated. “You don’t want a data breach around what companies are paying me.”
CHAIN-OF-THOUGHT
Chain-of-thought is another form of prompt augmentation that’s lauded for its capacity to enable users to guide models through the steps required to answer complicated prompts. As such, it expands the utility of enterprise search to transcend looking for documents and answering basic questions to providing predictive capabilities pertinent to analytics and BI. Das articulated a use case in which a user prompts a language model to identify “which of the customers today should I pay attention to, because they may churn next quarter.”
Instead of building ML models from scratch to answer this question, users can simply explain the indicators of customer churn as a way of augmenting their prompt. “I can say that the way someone churns is usually Step One, Step Two, and Step Three,” Das revealed. “Now that I’ve told those steps to the model, whenever I ask a question like this, it can use chain-of-thought.”
Companies and Suppliers Mentioned