Welcome to my homepage

My name is Md. Mahadi Hasan Sibat. I am a second-year PhD student in the Department of Computer Science at the University of Central Florida (UCF), advised by Dr. Shubhra Kanti Karmaker (Santu). I am a member of the Bridge-AI Lab at UCF.

My research focuses on the evaluation and trustworthiness of large language models. In particular, I am interested in building rigorous benchmarks that expose failure modes in LLMs on real-world tasks, and in developing methods to detect and mitigate benchmark contamination — a pervasive problem that causes reported model performance to be systematically overestimated.

Before joining UCF, I completed my MS in Computer Science (Software Engineering) at Auburn University, where I also worked as a Graduate Research and Teaching Assistant. Prior to academia, I worked as a Senior Software Engineer at Reve Systems in Bangladesh for over three years.

You can download my full CV here.

Research Interests

  • LLM Evaluation & Benchmarking
  • Benchmark Contamination Detection and Mitigation
  • IT Automation and Infrastructure as Code (Ansible, IaC)
  • Code Generation for Real-World Tasks
  • Natural Language Processing and its Applications
  • Conversational Data Science

News and Announcements

  • [April 2026] ACL 2026 Our paper Large Language Models for IT Automation Tasks: Are We There Yet? has been accepted to ACL 2026 Findings. We present ITABench, a benchmark of 126 real-world Ansible tasks. Best model achieves only 23.9% pass@10.
  • [April 2026] ACL 2026 Our paper The Path Not Taken: Duality in Reasoning about Program Execution has been accepted to the ACL 2026 Main Conference (co-authored with Eshgin Hasanov, Santu Karmaker, and Aashish Yadavally).
  • [Aug 2024] Joined the University of Central Florida as a PhD student, advised by Dr. Santu Karmaker.
  • [2024] FSE 2024 Paper accepted at FSE 2024: State Reconciliation Defects in Infrastructure as Code.
  • [2024] TMLR Paper accepted at TMLR 2024: Introducing Forecast Utterance for Conversational Data Science.

Publications

  • ACL 2026 Findings · 2026
    Md. Mahadi Hasan Sibat, John Salvador, Akond Ashfaque Ur Rahman, Shubhra Kanti Karmaker Santu
  • ACL 2026 Main · 2026
    The Path Not Taken: Duality in Reasoning about Program Execution
    Eshgin Hasanov, Md. Mahadi Hasan Sibat, Santu Karmaker, Aashish Yadavally
  • FSE 2024 · 2024
    State Reconciliation Defects in Infrastructure as Code
    Md. Mahadi Hasan Sibat, John Salvador, Shubhra Kanti Karmaker Santu, Akond Ashfaque Ur Rahman
  • TMLR 2024 · 2024
    Introducing "Forecast Utterance" for Conversational Data Science
    Md. Mahadi Hasan Sibat, R. Alexander Knipper, Shubhra Kanti Karmaker Santu
  • ACM CSUR 2022 · 2022
    Shubhra Kanti Karmaker Santu, Md. Mahadi Hasan Sibat, Micah J. Smith, Lei Xu, Chengxiang Zhai, Kalyan Veeramachaneni
  • Under Review · ARR 2026
    FinTradeBench: A Financial Reasoning Benchmark for LLMs
    Yogesh Agrawal, Aniruddha Dutta, Md. Mahadi Hasan Sibat, Santu Karmaker, Aritra Dutta

Experience & Education

  • Fall 2024 — Present
    PhD in Computer Science
    University of Central Florida, Orlando FL  ·  Advisor: Dr. Santu Karmaker
  • Aug 2021 — Aug 2024
    MS in Computer Science (Software Engineering)
    Auburn University, Auburn AL  ·  Woltosz Fellowship (2021–2024)
  • Sep 2017 — Dec 2020
    Senior Software Engineer
    Reve Systems, Dhaka, Bangladesh
  • 2012 — 2017
    B.Sc. in Computer Science & Engineering
    Bangladesh University of Engineering and Technology (BUET)

Academic Service

  • 2023
    Track Committee Member — ACL 2023
  • 2022
    Track Committee Member — EMNLP 2022
  • 2023, 2025, 2026
    Reviewer — ACL Rolling Review (ARR)
  • 2024, 2025
    Reviewer — Transactions on Machine Learning Research (TMLR)
  • 2023
    Reviewer — EMNLP 2023, Workshop BLP