Publications
Due to the widespread interaction with Large Language Models (LLMs), concerns regarding the leakage of private information have grown significantly. Although there have been some significant works taken place in English to preserve the privacy of the sensitive text, low-resource language like Bangla remain largely unexplored. To address this issue, we employed two human-rewritten approaches: (i) deleting sensitive expressions and (ii) obscuring sensitive details by abstracting them, both of which have achieved notable success in English. At first, we developed a corpus, named PriBan, through translation, crowd-sourcing and the utilization of Large Language Model (LLMs). Being the first work on Bangla, we achieved a satisfactory level of privacy preservation. We also demonstrated strong results in maintaining a natural tone in the generated text. Through automated evaluation metrics, we found that our approach preserved privacy with an average accuracy of 91.43% for obscure rewrite method and 94.75% for delete rewrite method. We believe this work sets the foundation for privacy-preserving text generation in Bangla and can be further extended to other low-resource languages in future research.
@inproceedings{chakma2025priban,
title={PriBan: A Benchmark Dataset and Modeling Framework for Privacy Preservation in Bengali Texts},
author={Md Anowarul Faruk Shishir and Bishaw Kirti Chakma and Indrajit Gupta},
booktitle={International Conference on Computer and Information Technology (ICCIT)},
year={2025},
note={Under Review}
}
title={PriBan: A Benchmark Dataset and Modeling Framework for Privacy Preservation in Bengali Texts},
author={Md Anowarul Faruk Shishir and Bishaw Kirti Chakma and Indrajit Gupta},
booktitle={International Conference on Computer and Information Technology (ICCIT)},
year={2025},
note={Under Review}
}