Hello! I am Aisha Khatun, a computer science graduate student at the University of Waterloo with interests in AI, ML, and everything Data!
As a graduate researcher, I analyze the capabilities and limitations of Large Language Models (LLM)
in answering questions about sensitive topics, instruction following, and prompt variations.
I am passionate about AI, solving complex problems by applying ML techniques, and extracting valuable information through data analysis.
Let's connect and discuss exciting opportunities in the field of AI and computer science!
www.linkedin.com/in/tanny411 |
aysha.kamal7@gmail.com |
aisha.khatun@uwaterloo.ca
Languages: (Proficient) Python, SQL, SPARQL, Javascript, C++ (Comfortable) Java, PHP, Scala
Frameworks and tools: NumPy, Pandas, Spark, PySpark, Matplotlib, Plotly, Seaborn, SciPy, Scikit-Learn, Keras, Pytorch, Fast.ai, Tensorflow2, HuggingFace, Hadoop, Airflow, Tableau, Power BI
Database: MongoDB, PostgreSQL
Version Control and Collaboration: Git, GitHub, GitLab, Gerrit, Phabricator
Web Technologies: HTML/CSS, JQuery, NodeJS, React-Redux, Express.js, Flask
See details, volunteer experience, and awards in the experience and education page.
I am working with Professor Daniel G. Brown on the ability of LLMs to respond appropriately and consistently to sensitive topics with prompt variations. We analyzed over 30 models, both open and closed source,
and found that most models can barely understand the task at hand. They are sensitive to slight variations in prompt wording and have different responses in different settings.
Paper. Paper.
I worked with the Research Team as a Research Data Scientist (NLP) to develop Copyediting as a structured task. We used Wiktionary to curate a list of commonly misspelled words and detected misspellings in Wikipedia articles in all languages in an automated fashion.
Meta page,
Report,
Code.
Currently, I am working on addressing deployment bottlenecks and improving the Wikipedia link recommendation system accuracy by creating a language-agnostic model to replace the 300+ individual language-dependent models.
Meta page,
Report
Code.
As a Data analyst, I worked with the Search and Analytics team to help scale the Wikidata query service.
We performed data analysis on Wikidata dumps and combined it with user's SPARQL query analysis to identify the most frequently queried Wikidata subgraphs.
This helped inform decisions to split Wikidata to handle the ever-increasing size of the graph (Ticket).
Analysis work: wikitech/User:AKhatun.
Selected as an Intern in Outreachy to work with the Abstract Wikipedia project under Wikimedia Foundation.
Performed data analysis and source code similarity analysis using unsupervised machine learning to identify important modules for centralization in Abstract Wikipedia.
See more in Abstract_Wikipedia/Data
Performed data analysis and applied machine learning algorithms on computer vision and time-series data for pattern recognition and prediction generation. Used state-of-art face detection models, OCRs, and sensor readings for fall detection. Analysed sleep patterns from sleep-mats to identify abnormal activities at night, and analysed server logs to identify ideal downtime for application release.
See details and online courses I have done in the experience and education page.
Advisor: Daniel G. Brown
Courses taken:
CS848 F22: The Art and Science of Empirical Computer Science
CS848 F22: Knowledge Graphs
CS889 W23: InfoVis for AI Explainability
CS889 S23: Value-Driven Technology
Completed undergraduate thesis on Authorship Attribution in Bangla Literature.
Applied deep learning NLP techniques to achieve high performing scalable systems.
Advisor: Md Saiful Islam,
Ayesha Tasnim
Thesis Report
Core Courses: Algorithm Design and Analysis, Data Structure, Database System, Object Oriented Programming, Software Engineering and Design Patterns, Technical Writing and Presentation, Artificial Intelligence, Introduction to Data Science, Machine Learning
GitHub repositories that I've built.
Articles I've written. See all blog posts here.