I am working with Professor Daniel G. Brown on the ability of LLMs to respond appropriately and consistently to sensitive topics with prompt variations. We analyzed over 30 models, both open and closed source, compared their performance in various areas like task understanding and response consistency. Surprisingly, our findings indicate that most models, even some large open-source models have a difficult time answering simple Yes/No questions and can barely understand the task at hand. They are prone to changing their responses with slight variations in prompt wording and have different responses in different settings. These findings warn us against using LLMs for Q/A without proper planning and testing given their limited instruction instruction-following and task-understandig capacity.
I worked with the Research Team as a Research Data Scientist (NLP) to develop Copyediting as a structured task. Meta page, Report, Code.
As a contract data analyst, I worked with the Search and Analytics team to analyze SPARQL queries along with Wikidata dump to help scale the wikidata query service. Analysis work is done using both Spark (Scala) and PySpark (Python) on Hadoop clusters. Analysis work: wikitech/User:AKhatun.
Selected as an Intern in Outreachy to work with the Abstract Wikipedia project under Wikimedia Foundation.
Performed data analysis and applied machine learning algorithms on computer vision and time-series data for pattern recognition and prediction generation.
Worked on developing larger datasets and implementing transfer learning based deep learning approaches for Authorship Attribution in Bengali Literature, thus far surpassing the existing systems. Work available in GitHub. Datasets available in Mendeley.
Advisor: Daniel G. Brown
Courses taken:
CS848 F22: The Art and Science of Empirical Computer Science
CS848 F22: Knowledge Graphs
CS889 W23: InfoVis for AI Explainability
CS889 S23: Value-Driven Technology
Completed undergraduate thesis on Authorship Attribution in Bangla Literature.
Applied deep learning NLP techniques to achieve high performing scalable systems.
Advisor: Md Saiful Islam,
Ayesha Tasnim
Thesis Report
Core Courses: Algorithm Design and Analysis, Data Structure, Database System, Object Oriented Programming, Software Engineering and Design Patterns, Technical Writing and Presentation, Artificial Intelligence, Introduction to Data Science, Machine Learning
Received the Barbara Hayes-Roth Award for Women in Math and Computer Science for demonstrated academic excellence as a graduate student in University of Waterloo.
Selected as one of 54 Outreachy Interns among 1000+ applicants through contributions in various Open Source projects.
Secure and Private AI scholarship from Facebook. One of 300 out of 6000 candidates selected worldwide.
2nd Place, Best Research Poster Award, ICBSLP (International Conference). Presented our work on Authorship Attribution in Bengali Literature using transfer learning and compared it to existing systems and character-level CNN architectures.
Education Board Scholarship during undergraduate (4 years long) for best performance nation-wide awarded by the Education Board, Government of Bangladesh.
Mentored a group at the Directed Readinig Program (DRP) where undergraduate students get introduced to new topics in Math and Computer Science and possibly some gentle introduction to research. My group learned about LLMs, ways to set up and use a personal LLM, and its applications in creative endeavors like story-telling.
Discussed my research and various opportunities in Computer Science with underrepresented students in Grade 9-10 across Canada. The aim was to bust the myths of Computer Science and invite inclusivity in the field, to show students how Computer Science is welcoming to all, whether you are math-savvy or not, tech-savvy or not, and what the recent career prospects look like for Computer Science students. Slides.
Helped set up and guide young girls in high school and below during several Women in CS Technovation events at University of Waterloo.
Conducted a hands-on beginners AI workshop at University of Waterloo for the WiCS (Women in CS) Conference 2023. Attendees included Undergraduate and High School students. Attendees learned about AI and were given a run down on a simple ML problem using the Titanic Kaggle Competition.
Organized events and helped applicants, mentors, and interns in all steps pertaining to the internships.
Panelist for 2 sessions and a project presenter at Wikidata Conference (WikidataCon). Sessions, Video.
Since my work during Outreachy Internship with Abstract Wikipedia,
I have been working on improving and developing the abstract-wiki-ds tool
to better perform clustering on source code. UI developement is also underway.
Taking small steps, but steps nonetheless.
Phabricator: T263678
Conducted a series of IEEE beginners Machine Learning Workshop. Workshop materials available in GitHub.
Trained junior year undergraduate students for Competitive Programming.
This nanodegree was the 2nd phase of the Secure and Private AI Facebook Udacity Scholarship. Learned and applied Image Processing, Transfer learning, Kalman filters, Graph SLAM algorithm. Completed projects include Day-Night Detection, Facial keypoint detection, Object Detection, Image Captioning, Sentiment analysis and, Object Localization and Mapping.
CertificateA bundle of Datacamp courses for python, data cleaning, manipulation and analysis, pandas, data visualization, SQL, statistical thinking and, machine learning.
CertificateJeremy Howard
Machine learning course and Deep learning specialization by Andrew Ng, Coursera. Thsese courses cover everything from the basics of machine learning and neural networks from scratch to deep learning techniques in computer vision and NLP with CNN, RNN, GRU and LSTMs, hyperparameter tuning and structuring machine learning projects.
Machine Learning Certificate
DIY (Do It Yourself) Track.
This summer of code was my introduction to Python and Machine Learning for the first tme ever.
Tons of amazing volunteer mentors helped me a lot, from setting up python to understanding support vector machines, random forests.
Setting up python and anaconda, in windows(!), was quite messy. Multiple python and too many incompatibality issues. I have a come a long way since then.
Did a lot of assignments and solved ML problems with the completely hands workshops in this SoC.