Publications | About Me

PriBan: A Benchmark Dataset and Modeling Framework for Privacy Preservation in Bengali Texts

Authors: Md Anowarul Faruk Shishir, Bishaw Kirti Chakma, Indrajit Gupta

Submitted to ICCIT 2025 (Under Review)

Due to the widespread interaction with Large Language Models (LLMs), concerns regarding the leakage of private information have grown significantly. Although there have been some significant works taken place in English to preserve the privacy of the sensitive text, low-resource language like Bangla remain largely unexplored. To address this issue, we employed two human-rewritten approaches: (i) deleting sensitive expressions and (ii) obscuring sensitive details by abstracting them, both of which have achieved notable success in English. At first, we developed a corpus, named PriBan, through translation, crowd-sourcing and the utilization of Large Language Model (LLMs). Being the first work on Bangla, we achieved a satisfactory level of privacy preservation. We also demonstrated strong results in maintaining a natural tone in the generated text. Through automated evaluation metrics, we found that our approach preserved privacy with an average accuracy of 91.43% for obscure rewrite method and 94.75% for delete rewrite method. We believe this work sets the foundation for privacy-preserving text generation in Bangla and can be further extended to other low-resource languages in future research.

@inproceedings{chakma2025priban,
  title={PriBan: A Benchmark Dataset and Modeling Framework for Privacy Preservation in Bengali Texts},
  author={Md Anowarul Faruk Shishir and Bishaw Kirti Chakma and Indrajit Gupta},
  booktitle={International Conference on Computer and Information Technology (ICCIT)},
  year={2025},
  note={Under Review}
}

Detection of LLM (Large Language Model) Generated Text Using Long Short Term Memory

Authors: Bishaw Kirti Chakma
Undergraduate Thesis

Supervisor: Dr. Md. Foisal Hossain

DOI PDF

The rapid growth of Large Language Models (LLMs) has created new hurdles in detecting artificially generated text. Concerns about the possible misuse of created content, including spam, harmful activities, and misinformation, have grown as LLMs, like GPT-3, continue to develop. The urgent need for reliable techniques to recognise LLM-generated text and separate it from content created by humans is discussed in this paper. We examine current methods and suggest better ones that make use of machine learning algorithms and linguistic, semantic, and contextual aspects. Our goal is to reduce the risks related with the spread of LLM-generated material by improving the effectiveness and accuracy of detection systems and promoting a safer online environment. We also go over possible responses and ethical issues to weigh the advantages of LLMs against the need to avoid misuse. The suggested remedies have wide-ranging effects in a number of fields, such as content control, cybersecurity, and the appropriate use of LLMs in a variety of applications.

@thesis{chakma2024llm,
  title={Detection of LLM (Large Language Model) Generated Text Using Long Short Term Memory},
  author={Bishaw Kirti Chakma},
  school={Khulna University of Engineering & Technology},
  year={2024},
  supervisor={Dr. Md. Foisal Hossain}
}