Hello! I am Aisha Khatun, working as a Data Engineer at the Wikimedia Foundation with interests in AI, ML, and everything Data! I work on
building and managing ETL pipelines, managing terabytes of free and open source data, helping teams derive value for the foundation and
the community around the Wikimedia Movement. I have direct connections to AI, ML, data science and data analytics from my previous work
as a researcher and data scientist. Now, I have decided to go a step deeper and work on the fundamental
layer of data storage, governance, lineage, and privacy. I did my Masters in Computer Science from the University of Waterloo.
As a graduate researcher, I analyzed the capabilities and limitations of Large Language Models (LLM) in answering questions about
sensitive topics, instruction following, and prompt variations.
I am passionate about AI, solving complex problems by applying ML techniques, and extracting valuable information from data.
Let's connect and discuss exciting opportunities in the field of AI and computer science!
www.linkedin.com/in/tanny411 |
aysha.kamal7 AT gmail DOT com
Languages: Python, Kotlin, Java, Scala, SQL, SPARQL, Javascript, C++, PHP
Frameworks and tools: NumPy, Pandas, Spark, PySpark, Hadoop, Airflow, Kafka, Docker, Kubernetes
ML and Analysis tools: SciPy, Scikit-Learn, Keras, Pytorch, Fast.ai, Tensorflow2, HuggingFace, Langchain, Tableau, Power BI, Matplotlib, Plotly, Seaborn
Database: MongoDB, PostgreSQL, MySQL, Hive, Presto
Version Control and Collaboration: Git, GitHub, GitLab, Gerrit, Phabricator, Asana
Web Technologies: HTML/CSS, JQuery, NodeJS, React-Redux, Express.js, Flask
This lists the major full-time roles I had. See more details, short stints, volunteer experience, and awards in the experience and education page.
Managing and building ETL pipelines for various analytical use-cases across the foundation, deriving metrics for the community, helping assess the health of the foundation. Monitoring system health as part of Ops-Week, contributing to data governance and privacy initiatives. Modernizing the platform for streaming and batch processes. There is always something to improve, there is always new use-cases to build. Fun!
Worked on highly durable, available, scalable distributed write ahead log system for AWS. Was responsible for end-to-end service development, improvement, deployment, and on-call rotations. Launched features to automatically manage high volume customers which included writing design docs, developing durable, low-latency APIs while ensuring smooth customer experience.
As a Data analyst, I worked with the Search and Analytics team to help scale the Wikidata query service.
Performed data analysis on Wikidata dumps and combined it with user's SPARQL query analysis to identify the most frequently queried Wikidata subgraphs.
This helped inform decisions to split Wikidata to handle the ever-increasing size of the graph (Ticket).
Analysis work: wikitech/User:AKhatun.
Performed data analysis and applied machine learning algorithms on computer vision and time-series data for pattern recognition and prediction generation. Used state-of-art face detection models, OCRs, and sensor readings for fall detection. Analysed sleep patterns from sleep-mats to identify abnormal activities at night, and analysed server logs to identify ideal downtime for application release.
See details and more online courses I have done in the experience and education page.
Completed Masters thesis research on the ability of LLMs to respond appropriately and consistently to sensitive topics with prompt variations. Advisor: Daniel G. Brown (Thesis Report)
Completed undergraduate thesis on Authorship Attribution in Bangla Literature.
Applied deep learning NLP techniques to build high performing scalable system to identify authors of text.
Advisors: Md Saiful Islam,
Ayesha Tasnim
(Thesis Report)
GitHub repositories that I've built.
Articles I've written. See all blog posts here.