Welcome to my homepage

My name is Md. Mahadi Hasan Sibat. I am a second-year PhD student in the Department of Computer Science at the University of Central Florida (UCF), advised by Dr. Shubhra Kanti Karmaker (Santu). I am a member of the Bridge-AI Lab at UCF.

My research focuses on the evaluation and trustworthiness of large language models. In particular, I am interested in building rigorous benchmarks that expose failure modes in LLMs on real-world tasks, and in developing methods to detect and mitigate benchmark contamination — a pervasive problem that causes reported model performance to be systematically overestimated.

Before joining UCF, I completed my MS in Computer Science (Software Engineering) at Auburn University, where I also worked as a Graduate Research and Teaching Assistant. Prior to academia, I worked as a Senior Software Engineer at Reve Systems in Bangladesh for over three years.

You can download my full CV here.

Research Interests

LLM Evaluation & Benchmarking
Benchmark Contamination Detection and Mitigation
IT Automation and Infrastructure as Code (Ansible, IaC)
Code Generation for Real-World Tasks
Natural Language Processing and its Applications
Conversational Data Science

News and Announcements

[April 2026] ACL 2026 Our paper Large Language Models for IT Automation Tasks: Are We There Yet? has been accepted to ACL 2026 Findings. We present ITABench, a benchmark of 126 real-world Ansible tasks. Best model achieves only 23.9% pass@10.
[April 2026] ACL 2026 Our paper The Path Not Taken: Duality in Reasoning about Program Execution has been accepted to the ACL 2026 Main Conference (co-authored with Eshgin Hasanov, Santu Karmaker, and Aashish Yadavally).
[Aug 2024] Joined the University of Central Florida as a PhD student, advised by Dr. Santu Karmaker.
[2024] FSE 2024 Paper accepted at FSE 2024: State Reconciliation Defects in Infrastructure as Code.
[2024] TMLR Paper accepted at TMLR 2024: Introducing Forecast Utterance for Conversational Data Science.

Publications

ACL 2026 Findings · 2026

Large Language Models for IT Automation Tasks: Are We There Yet?

Md. Mahadi Hasan Sibat, John Salvador, Akond Ashfaque Ur Rahman, Shubhra Kanti Karmaker Santu

Paper
ACL 2026 Main · 2026

The Path Not Taken: Duality in Reasoning about Program Execution

Eshgin Hasanov, Md. Mahadi Hasan Sibat, Santu Karmaker, Aashish Yadavally
FSE 2024 · 2024

State Reconciliation Defects in Infrastructure as Code

Md. Mahadi Hasan Sibat, John Salvador, Shubhra Kanti Karmaker Santu, Akond Ashfaque Ur Rahman

Paper
TMLR 2024 · 2024

Introducing "Forecast Utterance" for Conversational Data Science

Md. Mahadi Hasan Sibat, R. Alexander Knipper, Shubhra Kanti Karmaker Santu

Paper
ACM CSUR 2022 · 2022

AutoML to Date and Beyond: Challenges and Opportunities

Shubhra Kanti Karmaker Santu, Md. Mahadi Hasan Sibat, Micah J. Smith, Lei Xu, Chengxiang Zhai, Kalyan Veeramachaneni

Paper
Under Review · ARR 2026

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Yogesh Agrawal, Aniruddha Dutta, Md. Mahadi Hasan Sibat, Santu Karmaker, Aritra Dutta

Experience & Education

Fall 2024 — Present

PhD in Computer Science

University of Central Florida, Orlando FL · Advisor: Dr. Santu Karmaker
Aug 2021 — Aug 2024

MS in Computer Science (Software Engineering)

Auburn University, Auburn AL · Woltosz Fellowship (2021–2024)
Sep 2017 — Dec 2020

Senior Software Engineer

Reve Systems, Dhaka, Bangladesh
2012 — 2017

B.Sc. in Computer Science & Engineering

Bangladesh University of Engineering and Technology (BUET)

Academic Service

2023

Track Committee Member — ACL 2023
2022

Track Committee Member — EMNLP 2022
2023, 2025, 2026

Reviewer — ACL Rolling Review (ARR)
2024, 2025

Reviewer — Transactions on Machine Learning Research (TMLR)
2023

Reviewer — EMNLP 2023, Workshop BLP