Anonymize RAG data in IBM Granite and Ollama using HCP Vault
You can use retrieval augmented generation (RAG) to refine and improve the output of a large language model (LLM) without retraining the model. However, many data sources include sensitive information, such as personal identifiable information (PII), that the LLM and its applications should not require or disclose — but sometimes they do. Sensitive information disclosure is one of the OWASP 2025 Top 10 Risks & Mitigations for LLMs and Gen AI Apps. For example, an LLM may leak sensitive information when a user asks a question that requires that information in response. After the retrieval engine gets sensitive information and provides it to the LLM as context, the LLM generates a response without disclosing it. However, a user may ask a more specific question that relates to sensitive information and the LLM responds with it in its context. To mitigate this concern, OWASP recommends data sanitization, access control, and tokenization. ...