Aisha Khatun

Toronto, Canada

About Me

Hello! I am Aisha Khatun, working as a Data Engineer at the Wikimedia Foundation with interests in AI, ML, and everything Data! I work on building and managing ETL pipelines, managing terabytes of free and open source data, helping teams derive value for the foundation and the community around the Wikimedia Movement. I have direct connections to AI, ML, data science and data analytics from my previous work as a researcher and data scientist. Now, I have decided to go a step deeper and work on the fundamental layer of data storage, governance, lineage, and privacy. I did my Masters in Computer Science from the University of Waterloo. As a graduate researcher, I analyzed the capabilities and limitations of Large Language Models (LLM) in answering questions about sensitive topics, instruction following, and prompt variations. I am passionate about AI, solving complex problems by applying ML techniques, and extracting valuable information from data. Let's connect and discuss exciting opportunities in the field of AI and computer science!
www.linkedin.com/in/tanny411 | aysha.kamal7 AT gmail DOT com

Technical Skills

Languages: Python, Kotlin, Java, Scala, SQL, SPARQL, Javascript, C++, PHP
Frameworks and tools: NumPy, Pandas, Spark, PySpark, Hadoop, Airflow, Kafka, Docker, Kubernetes
ML and Analysis tools: SciPy, Scikit-Learn, Keras, Pytorch, Fast.ai, Tensorflow2, HuggingFace, Langchain, Tableau, Power BI, Matplotlib, Plotly, Seaborn
Database: MongoDB, PostgreSQL, MySQL, Hive, Presto
Version Control and Collaboration: Git, GitHub, GitLab, Gerrit, Phabricator, Asana
Web Technologies: HTML/CSS, JQuery, NodeJS, React-Redux, Express.js, Flask

Experience

This lists the major full-time roles I had. See more details, short stints, volunteer experience, and awards in the experience and education page.

Data Engineer III
Wikimedia Foundation

Managing and building ETL pipelines for various analytical use-cases across the foundation, deriving metrics for the community, helping assess the health of the foundation. Monitoring system health as part of Ops-Week, contributing to data governance and privacy initiatives. Modernizing the platform for streaming and batch processes. There is always something to improve, there is always new use-cases to build. Fun!

Software Dev Engineer I
Amazon Web Services

Worked on highly durable, available, scalable distributed write ahead log system for AWS. Was responsible for end-to-end service development, improvement, deployment, and on-call rotations. Launched features to automatically manage high volume customers which included writing design docs, developing durable, low-latency APIs while ensuring smooth customer experience.

Data Analyst
Wikimedia Foundation

As a Data analyst, I worked with the Search and Analytics team to help scale the Wikidata query service. Performed data analysis on Wikidata dumps and combined it with user's SPARQL query analysis to identify the most frequently queried Wikidata subgraphs. This helped inform decisions to split Wikidata to handle the ever-increasing size of the graph (Ticket).
Analysis work: wikitech/User:AKhatun.

Machine Learning Engineer
Therap BD Ltd.

Performed data analysis and applied machine learning algorithms on computer vision and time-series data for pattern recognition and prediction generation. Used state-of-art face detection models, OCRs, and sensor readings for fall detection. Analysed sleep patterns from sleep-mats to identify abnormal activities at night, and analysed server logs to identify ideal downtime for application release.

Education

See details and more online courses I have done in the experience and education page.

Cheriton School of Computer Science, University of Waterloo

Master of Mathematics in Computer Science and Engineering
CGPA: 3.96/4.00

Completed Masters thesis research on the ability of LLMs to respond appropriately and consistently to sensitive topics with prompt variations. Advisor: Daniel G. Brown (Thesis Report)

Shahjalal University of Science and Technology

B.Sc. in Computer Science and Engineering
CGPA: 3.89/4.00 (2nd in Class)

Completed undergraduate thesis on Authorship Attribution in Bangla Literature. Applied deep learning NLP techniques to build high performing scalable system to identify authors of text.
Advisors: Md Saiful Islam, Ayesha Tasnim (Thesis Report)

Publications

Media

Projects

GitHub repositories that I've built.

Systematic analysis of the responses of GPT-3 to different categories of statements and the potential vulnerabilities to simple prompting changes. We analyze what confuses GPT-3: how the model responds to certain sensitive topics and what affects the prompt wording has on the model response.
Jupyter Notebook 6 2
Analysis on Wikidata and Wikidata Query Service to help figure out ways to scale the service. Repository contains analysis code, written articles on the findings and visualizations.
Jupyter Notebook 1 0
Transfer learning for authorship attribution with unsupervised training of a language model that teaches a model the working and structure of Bangla language, followed by authorship attribution specific fine-tuning and classification. Effects of various tokenization methods are analyzed as well.
Jupyter Notebook 1 0
Contains my codes for various programming competitions and practices including learning Data Structure and Algorithms.
HTML 5 2
This repository contains some collection of my machine learning, deep learning and AI projects. This includes Kaggle, Courses and Personal projects.
Jupyter Notebook 0 0

Blog posts

Articles I've written. See all blog posts here.