Aisha Khatun

Available for hire

Blogs

Struggle and Grow

Published Dec 21, 2020

Struggles are a part of everything and my journey to open source with wikimedia is no different. In this post I’d like to highlight some things I overcame in the first 3 weeks of my internship. BTW, writing this post is also part of the struggle, it’s hard to shine a light on ones flaws but trust me, it’s liberating! Because “Everybody Struggles”!

I am an Outreachy intern @ Wikimedia working on analyzing community authored functions to help take some initial steps towards the much greater Abstract Wikipedia project.

Learn about Abstract Wikipedia from this paper and check out our work in GitHub and Phabricator. To keep myself up and running I document everything I do in my blog about internship progress and in my wikimedia user page.

My internship began smooth, doing lots of readings to learn more about our project and WMF principles, setting up accounts and user pages etc. Then it was time to head to work. I started learning about various tools in wikimedia we will be using to complete our work. Of course I haven’t worked with these before, and now that I look back it’s not hard at all. But when I started out, things seemed intimidating. I spent 2 whole days just riffling through wiki links getting a smatter of understanding here and there. My mentor did give us some really good starting points to work with. Slowly I started getting the whole picture, I started testing out the tools, and THEN I finally understood things well enough to get started in my own code. In the middle of these half-got-it, half-frustrated situations its easy to break down and feel like I know nothing, its too hard for me or maybe Im not made for this. But the magic is in patience and sticking to it. Patience is often not my thing, so I compensated by taking breaks and getting back to things in intervals. Things always work out in the end if we stick to it.

In my 3rd week, I started writing code to fetch lua Module contents across all wikis. It had to be done through API calls, something I had worked with already during the contribution phase. So I thought it’s going to be a breeze! Max 2 days, and then I can continue with data analysis, the most exciting part. Lo and behold, I got stuck and it took me a whole week. There was much more to fetching and checking large amounts (~2GB) of data than I could have anticipated. First of all, it is not possible to run the scripts normally, I had to run it in Grid, which ensures that the job is run in a suitable place with sufficient resources. So every time I got a error, I had to change code in my computer, send it to toolforge, set it to run in Grid, check the .err file, fix my code again and run the loop. It was time consuming, especially given that I got ALL kinds of error. Also finding the perfect balance between frequency of writing to file vs memory I have access to vs number of jobs I can run simultaneously was a trouble. Some trial and error alter, here I am! I am still stuck with racing conditions though, as multiple jobs keep writing to the same file, which I will be addressing soon this week.

I had to learn ways to manage memory within my code - lots of stackoverflow - lots of pandas docs. I tried multiple ways and kept getting memory overflow in various places in my code. I kept changing those to more memory efficient ways, and sometimes in not so obvious ways. In one place I ran a loop in one file, loading 500 rows at a time, checked those against the other file for duplicates. This was a simple aha-moment solution that did not ring a bell in my head until later on. I was finally able to run my code in Grid and get my outputs WITHOUT any errors! Ahh… that empty error file still bring me peace!

Sometimes I got weird errors I could not understand. I was also having trouble using scp to copy files over to my PC. I did some googling but things were not clearing up. I decided to call up on people in IRC, which made me a bit insecure. I saw people talking there about specific issues and things I have no idea about. I felt like people will think I know so little that I am asking about such simple things. But I cannot progress solo and needed help. So I dared to ask in IRC. And multiple people replied and solved my issues. I thought to myself - “okay…soo…thats it? No one thought anything, no one even cares how much I know. I was just overthinking”. This part was honestly liberating, and I encourage everyone to STOP overthinking and feeling low. There’s always a start for everything. As my mentor said - “pretty much any question is a question worth asking”. Do your research, get the obvious solutions or resources out of the way, and if you are still stuck, just ask!

Thanks for bearing with my verbose blog but I hope it helps people stick to the problems and get out of the bubble!