STEM equity @ Science for America | formerly: data science @ Biobot Analytics, PhD @ MIT, Luce Scholar in Cambodia | public health data, microbiome, personalized medicine | gender and racial equity in STEM | she/hers
As the 2019 Research Parasite Junior Parasite award recipient, GigaScience asked me to share some thoughts on life as a research parasite - tips and philosophies on doing open and reproducible science as a PhD student. The resulting piece is now published! ... Read more
While the focus of my PhD was my scientific research, I also grew significantly as a leader and person through the extracurriculars I did. And even though none of those things will show up in my google scholar, I’ve been proud to contribute as “co-first-author” to many initiatives. I want to share them here, both as a reminder to myself and so that they can survive somewhere on the internet even if things in my department change. ... Read more
In general, I’ve come to believe that writing down standards and expectations is an absolute necessary first step to making any improvements. It’s a basic tenet of conflict management: “what would it look like for you to succeed?” In my last year or so as a BE Ref, I really ramped down the amount of effort and labor I put in to improving the BE student experience. It became so apparent to me that our department’s lack of consensus of what it means to be a successful BE graduate student meant that anything we tried to do could just so easily lead to us running around in circles: how on earth do you expect a student group to improve the graduation process for students, if no one knows or is willing to describe what is even required for graduation? ... Read more
A few weeks ago, I got to give a really fun talk to my lab and friends. A postdoc asked me to present at our lab summit (which is like a day-long group meeting), and I said I didn’t want to talk about my science because I was going to have just defended! Instead, I decided to use the time to get on my soapbox and talk about some of the things I learned and approaches I took during my computational PhD. tl’dr - love yourself, treat the PhD as an act of radical self-love and you will reap the benefits. ... Read more
ZOMG you can upload files to the SRA automatically via Amazon S3!! Here are notes for when I necessarily need to do this again or help someone in my lab do this again. ... Read more
Like I’ve written about before, we use our lab’s AWS account to share data with collaborators. This post explains the different options for getting access to and interacting with S3 buckets. ... Read more
I’ve recently become our lab’s AWS sysadmin, and my first task is to give our collaborators access to some of our data. In this post, I’ll briefly go over how I set that up and explain the different options that our collaborator has to access the data. ... Read more
On my road trip, I kept track of (almost) all the money I spent. I was already fairly surprised with some of my quick calculations about how little I ended up spending (just around $4000!), and I also wanted to dive a bit more into how much I spent, where, and on what. So here we go! ... Read more
Today, I figured out an answer to a question that I didn’t find asked anywhere on the internet. In case someone else (or me) asks this question later, I wanted to write up my solution for reference. This post goes over how to access and manipulate the right y-axis labels on a seaborn FacetGrid plot which was made with margin_titles = True. ... Read more
This morning, my roommates and I were discussing our bus-taking strategies: it was around 9 am, and one of them was about to go catch the 64 bus going to Kendall/MIT whereas the other one was planning to wait for the next bus, which goes to University Park. This got us talking about which route was faster: the Kendall/MIT route, which gets you closer to campus but seems to take a longer route there, or University Park, which drops you off farther from campus but gets there more directly. I had actually meant to look into this question in my previous commute blog post, so felt this was a great opportunity to do so! ... Read more
About a year ago, I moved from my lovely Beacon Hill apartment (300 yards from the subway) to a house full of my friends (a 20-25 minute walk from the nearest subway stop). I’m super happy in my new house (we have chickens!) and it was totally the right decision, but at the time my new commute felt daunting - and many of my friends told me I’d regret giving up the convenience of my amazing Beacon Hill location. So, I did what any aspiring data scientist would do and started gathering data to prove them wrong. (See a theme in my data collection posts yet? XD) ... Read more
A couple of months ago, I was having dinner with a friend who was trying to convince me to start online dating - he’s a hopeless romantic, and perhaps the only person on this earth who genuinely enjoys it. I really dislike online dating for many reasons and we’d had this conversation many times before, so I wasn’t interested in his arguments. But as he was telling me about the new app he was using, an idea started to form… Because of the way the app is set up, I realized I could test one of my longtime hypotheses, and in the process get some much-needed validation for why online dating sucks and definitively win our debate about whether or not I should sign up. ... Read more
Once you’ve made your first qiime2 plugin, you’ll need to build it into a conda package and upload it to anaconda.org so others can easily install it. This tutorial is intended for first-time python developers trying to put their package into conda, and specifically targeted toward people developing plugins for QIIME 2. ... Read more
Last week, I attended a workshop focused on developing software for a popular bioinformatics platform in my field, which is a space that is much more skewed toward men than I’m used to (as a bio*engineer, I’ve been mostly spared from situations with extreme gender imbalance). It was an interesting experience, and overall incredibly positive. However, we live in imperfect world and I had an interesting gendered experience that I want to reflect on here. ... Read more
Like I’ve written about before, we use our lab’s AWS account to share data with collaborators. This post explains the different options for getting access to and interacting with S3 buckets. ... Read more
As a side project from the meta-analysis, we developed a method to correct for batch effects in microbiome case-control studies. When we posted the preprint on biorxiv, Greg Caporaso emailed Sean and asked him if he’d like to put our method into qiime2. I happily volunteered - I’d heard a presentation about qiime2 and was super pumped about their plugin setup, where anyone can incorporate their method into qiime’s suite of tools, and I was excited to see how doable it was. The learning curve was a little steep at first, but not as bad as I expected! Here, I’ve cleaned up my notes into a guide through my development process. I hope this is helpful to others like me, who aren’t trained computer scientists/developers, but who are keen and able to learn the programming stuff to make their tools more useful to more people. ... Read more
Slopegraphs are always introduced as being introduced by this Edward Tufte post, though this page is my top Google hit for “slopegraph.” I’m not sure if the kind of plot I’m talking about is technically a slopegraph, but in my academic circles that’s usually the term we end up settling on after a conversation that almost always sounds like, “you know, those plots which are kind of like boxplots except the paired points are connected with lines.” ... Read more
One of the last parts before my full-fledged transition to github pages from wordpress was figuring out how to post nicely formatted jupyter notebooks. This was actually the reason I wanted to switch in the first place, but it turns out it wasn’t as straightforward as I’d hoped! I think I’ve found an acceptable, though imperfect, way to do this: here’s the general process I’ve settled on. ... Read more
I’ve recently become our lab’s AWS sysadmin, and my first task is to give our collaborators access to some of our data. In this post, I’ll briefly go over how I set that up and explain the different options that our collaborator has to access the data. ... Read more
In the past year or so, I’ve become a full-fledged tidy data convert. I use pandas and seaborn for almost everything that I do, and any time I figure out a new cool groupby trick I feel like I’ve PhD-leveled up. ... Read more
As part of the Microbiome Club’s outreach activities, Tu and I are teaching a group of high school girls about the microbiome next week. We’re developing a 3-day curriculum on bacteria and the human microbiome as part of the Young Leaders in STEM summer program led by the Cambridge Science Club for Girls. ... Read more
As the 2019 Research Parasite Junior Parasite award recipient, GigaScience asked me to share some thoughts on life as a research parasite - tips and philosophies on doing open and reproducible science as a PhD student. The resulting piece is now published! ... Read more
A few weeks ago, I got to give a really fun talk to my lab and friends. A postdoc asked me to present at our lab summit (which is like a day-long group meeting), and I said I didn’t want to talk about my science because I was going to have just defended! Instead, I decided to use the time to get on my soapbox and talk about some of the things I learned and approaches I took during my computational PhD. tl’dr - love yourself, treat the PhD as an act of radical self-love and you will reap the benefits. ... Read more
Once you’ve made your first qiime2 plugin, you’ll need to build it into a conda package and upload it to anaconda.org so others can easily install it. This tutorial is intended for first-time python developers trying to put their package into conda, and specifically targeted toward people developing plugins for QIIME 2. ... Read more
I wrote about my experiences with conflict management and interfaith dialogue at MIT for the Graduate Admission Blog. Read on or click on the link to learn more about what conflict management, God, and Donald Trump have in common in my mind! ... Read more
I started tracking the time I spend on various “work-related” activities near the beginning of my third year of grad school. When I started, I hadn’t yet discovered the magic of tidy data so I kept putting off analyzing the data. Now that it’s been over a year since I converted to tidy data, it’s time to dig in and see how I really use my time! ... Read more
ZOMG you can upload files to the SRA automatically via Amazon S3!! Here are notes for when I necessarily need to do this again or help someone in my lab do this again. ... Read more
Soon, Netflix will be canceling its DVD-by-mail program, the original service that helped Netflix crush Blockbuster and got us used to watching movies on-demand from the comfort of our homes before streaming was a thing. Perhaps not coincidentally, my dad cancelled my family’s subscription to the DVD service this winter. As my brother wisely put it upon hearing my dad’s news, “Netflix can finally stop buying physical DVDs now that their last customer cancelled!” ... Read more
On my road trip, I kept track of (almost) all the money I spent. I was already fairly surprised with some of my quick calculations about how little I ended up spending (just around $4000!), and I also wanted to dive a bit more into how much I spent, where, and on what. So here we go! ... Read more
This morning, my roommates and I were discussing our bus-taking strategies: it was around 9 am, and one of them was about to go catch the 64 bus going to Kendall/MIT whereas the other one was planning to wait for the next bus, which goes to University Park. This got us talking about which route was faster: the Kendall/MIT route, which gets you closer to campus but seems to take a longer route there, or University Park, which drops you off farther from campus but gets there more directly. I had actually meant to look into this question in my previous commute blog post, so felt this was a great opportunity to do so! ... Read more
About a year ago, I moved from my lovely Beacon Hill apartment (300 yards from the subway) to a house full of my friends (a 20-25 minute walk from the nearest subway stop). I’m super happy in my new house (we have chickens!) and it was totally the right decision, but at the time my new commute felt daunting - and many of my friends told me I’d regret giving up the convenience of my amazing Beacon Hill location. So, I did what any aspiring data scientist would do and started gathering data to prove them wrong. (See a theme in my data collection posts yet? XD) ... Read more
A couple of months ago, I was having dinner with a friend who was trying to convince me to start online dating - he’s a hopeless romantic, and perhaps the only person on this earth who genuinely enjoys it. I really dislike online dating for many reasons and we’d had this conversation many times before, so I wasn’t interested in his arguments. But as he was telling me about the new app he was using, an idea started to form… Because of the way the app is set up, I realized I could test one of my longtime hypotheses, and in the process get some much-needed validation for why online dating sucks and definitively win our debate about whether or not I should sign up. ... Read more
One of the keynotes discussing disparities in big data at this year’s Pacific Symposium on Biocomputing pointed to a Bloomberg article about Amazon’s same-day delivery areas: in Boston (at the time of writing), all neighborhoods surrounding Roxbury were eligible for same-day delivery but Roxbury was not. ... Read more
I came across this great blog post again today while doing some literature search for one of my projects. I remember really enjoying this post when I first encountered it, and it was as much of a joy to read the second time around! What I appreciate about this article is that it doesn’t try to refute the contentious claim that “most [biomedical] research findings are false” but instead puts a “yes and…” spin on it. ... Read more
I came across this FiveThirtyEight article looking at how men and women rate TV shows on IMDB. I love the creative analysis that the author did - it’s pretty simple data (just ratings and associated gender of the rater) but the story that comes out is really interesting. ... Read more
Yesterday, I attended the Women in Data Science Conference in Cambridge. I went in hoping to learn more about data science as a field, to identify career opportunities in data science for computational biologists interested in public impact, and to feel inspired by being in a room full of women doing science. I’d say the conference wasn’t well-structured enough (i.e. tied together by a common theme) for the first goal and not varied enough in topics for the second one. That third goal, though - nailed it. ... Read more
I just read two articles from my DataScienceWeekly email (so good! You should subscribe!) which do a really good job of humanizing data, talking so respectfully about its potential downfalls while also recognizing its tremendous opportunities for impact. ... Read more
Last weekend, I went to New York to attend Bloomberg’s Data for Good Exchange (and also hang out with my friends, eat bagels, and have brunch - obviously). I came in with pretty low expectations and left with a lot of excitement for finding myself a way into this field. ... Read more
I’ve been listening to a great podcast called Unladylike on my hikes, which has me angry and reflective (as good feminist content should). I’ve also been recounting stories on the phone while driving, and I’m noticing some commonalities. In my experiences, big disparities in sexism have come not in the actual sexist events themselves, but in their remnants, the emotions and impact after-the-fact. ... Read more
While the focus of my PhD was my scientific research, I also grew significantly as a leader and person through the extracurriculars I did. And even though none of those things will show up in my google scholar, I’ve been proud to contribute as “co-first-author” to many initiatives. I want to share them here, both as a reminder to myself and so that they can survive somewhere on the internet even if things in my department change. ... Read more
A Medium post by Eugenia Zuroski linked from Twitter, titled “Holding Patterns: On Academic Knowledge and Labor”, in combination with a brunch conversation in which my friend encouraged me to write down what I’ve learned from my advocacy work, motivated me to write this post. I honestly don’t know what I’m hoping to accomplish here. Maybe some part of me hopes that the faculty I’ve worked with read this and are spurred out of their complacency. Maybe being yet another voice calling out academia’s hypocrisies will magically tip the scale and lead to massive introspection campaigns from elite institutions. Or maybe I just want to be heard, and I want other student leaders reading this to know they’re seen and heard. ... Read more
I had a tense week with the internet a few weeks ago. I posted a tweet calling out a conference I was invited to for only having male speakers: ... Read more
Last week, I attended a workshop focused on developing software for a popular bioinformatics platform in my field, which is a space that is much more skewed toward men than I’m used to (as a bio*engineer, I’ve been mostly spared from situations with extreme gender imbalance). It was an interesting experience, and overall incredibly positive. However, we live in imperfect world and I had an interesting gendered experience that I want to reflect on here. ... Read more
I wrote about my experiences with conflict management and interfaith dialogue at MIT for the Graduate Admission Blog. Read on or click on the link to learn more about what conflict management, God, and Donald Trump have in common in my mind! ... Read more
Some reactions to a recent Insider Higher Ed article on “Hitting the [Diversity] Wall”. The tl;dr of my thoughts: (1) Yep, the wall is real. Finding other students working to remind ourselves that we’re not alone in this fight is critical. (2) The fight is for transformation beyond “diversity and inclusion” - it’s about transformation of power structures. That’s what makes it hard and inspiring. (3) The strategies we take are important: when do we work with our departments and when do we demand change? ... Read more
One of the keynotes discussing disparities in big data at this year’s Pacific Symposium on Biocomputing pointed to a Bloomberg article about Amazon’s same-day delivery areas: in Boston (at the time of writing), all neighborhoods surrounding Roxbury were eligible for same-day delivery but Roxbury was not. ... Read more
In conversations about improving diversity in STEM, I tend to run into “well-meaning” faculty who are resolutely against quotas for fear that they will only exacerbate impostor syndrome and other negative perceptions (e.g. “you only got in because you’re black”). This is such a frustrating position and although I haven’t quite found a time, place, or way to push back against it yet, in my deepest heart of hearts I know it’s fundamentally foolish. ... Read more
Our department is hosting an event called “Profscars” (like the Oscars, but for profs). The social chairs organizing this event emailed our entire student body asking for nominations for superlatives for each professor. A friend of mine pointed out that basically all of the women got superlatives based on their clothes/looks or mom status. I took a closer look and felt similarly appalled/taken aback, and then I did what any aspiring data scientist would do and decided to do some stats. ... Read more
My friend posted this great essay on her Facebook this morning, and I’d highly encourage everyone to read it. It’s titled ”Obscuring the importance of race: the implication of making comparisons between racism and sexism (or other -isms)”. ... Read more
Saw this on my Facebook today, which links to this longer article about turmoil at Thinx. https://www.facebook.com/teenvogue/posts/10154553768466312?match=dGVlbiB2b2d1ZSx0aGlueA%3D%3D Some comments: ... Read more
Active listening is a hallmark of conflict management. One of the most important parts of active listening is reflecting, which means that you basically say what the other person just said. Sometimes you can also reframe what they said to a positive, future-focused message (e.g. reframing a complaint into what they would want to be the case). Reframing is actually quite difficult to sustain, but reflecting is really easy - and transformative! ... Read more
I came across this FiveThirtyEight article looking at how men and women rate TV shows on IMDB. I love the creative analysis that the author did - it’s pretty simple data (just ratings and associated gender of the rater) but the story that comes out is really interesting. ... Read more
I’m sure Jon Eisen will have much more to say on this very shortly, but I just saw a post on Elisabeth Bik’s blog about Wageningen University’s Microbiology Centennial, which features ten keynote speakers who are all male. Womp womp. ... Read more
I dated (and very much liked!) someone who made similar points as in this National Review article. It wrenched my heart to hear his “arguments” then as it does to read them now. But this time, I’m not in an emotionally charged conversation, and I can respond: ... Read more
Yesterday, I attended the Women in Data Science Conference in Cambridge. I went in hoping to learn more about data science as a field, to identify career opportunities in data science for computational biologists interested in public impact, and to feel inspired by being in a room full of women doing science. I’d say the conference wasn’t well-structured enough (i.e. tied together by a common theme) for the first goal and not varied enough in topics for the second one. That third goal, though - nailed it. ... Read more
In my work as my department’s Diversity co-Chair, I’ve had the opportunity to think really critically about the language I (a white, cis, female) use when I talk about diversity. ... Read more
This is a guest post for 99 Days of Ultimate, a conversation held over 99 days that provides a platform for people to speak about a range of womxn’s experiences in ultimate frisbee. My friend Carolyn is this week’s moderator, and as an American living (and playing) in Scotland, she wanted to profile some of the differences between playing ultimate in the US and abroad. I spent a year in Cambodia before moving to Boston, which is also when I started playing frisbee more seriously. Part of the reason I became so passionate about frisbee that year is that I saw firsthand frisbee’s power to affect local change, and specifically in the context of gender equity. After returning from Cambodia, I joined MIT’s women’s ultimate frisbee team as one of the few graduate students on the team. Playing and becoming friends with the undergrads solidified my belief in the power of frisbee to drive social change and push gender equity forward. ... Read more
Last week, I attended a workshop focused on developing software for a popular bioinformatics platform in my field, which is a space that is much more skewed toward men than I’m used to (as a bio*engineer, I’ve been mostly spared from situations with extreme gender imbalance). It was an interesting experience, and overall incredibly positive. However, we live in imperfect world and I had an interesting gendered experience that I want to reflect on here. ... Read more
When I was on my cross-country road trip last year, I kept track of a lot of things in the hopes of doing amazing analyses when I got back. Turns out having a job as a data scientist makes it a lot harder to find time to do data science on the side, and so I’ve only really gotten a chance to look into my expenses. One of the things I had really wanted to do was make a map of all the places I went, but I didn’t actually know how to work with geospatial data in Python yet. I had this grand idea that I’d learn spatial data techniques while on my trip, but turns out hiking, drinking beer while watching the sunset, and going to bed early were way more compelling ways to spend my time. Luckily for me, one of the most fun parts of my new job has been learning geospatial coding and plotting techniques. I’ve been having fun with it in my job, and I realized that it also means I get to finally make my map! ... Read more
Our department is hosting an event called “Profscars” (like the Oscars, but for profs). The social chairs organizing this event emailed our entire student body asking for nominations for superlatives for each professor. A friend of mine pointed out that basically all of the women got superlatives based on their clothes/looks or mom status. I took a closer look and felt similarly appalled/taken aback, and then I did what any aspiring data scientist would do and decided to do some stats. ... Read more
I’m sure Jon Eisen will have much more to say on this very shortly, but I just saw a post on Elisabeth Bik’s blog about Wageningen University’s Microbiology Centennial, which features ten keynote speakers who are all male. Womp womp. ... Read more
My thesis proposal is on Tuesday, which of course means that I’ve been thinking a lot about impostor syndrome. The way I process difficult emotions is by talking about them to my friends, and in this process it’s crystallized to me that impostor syndrome comes in so many different flavors, some of which are much harder to address than others. ... Read more
One of the last parts before my full-fledged transition to github pages from wordpress was figuring out how to post nicely formatted jupyter notebooks. This was actually the reason I wanted to switch in the first place, but it turns out it wasn’t as straightforward as I’d hoped! I think I’ve found an acceptable, though imperfect, way to do this: here’s the general process I’ve settled on. ... Read more
An important but potentially confusing aspect of our percentile normalization method to correct for batch effects is that we add noise to the zeros. I’ve gotten a few questions about this, and wanted to write this short blog post to illustrate why we did this and why it improves the method. ... Read more
As a side project from the meta-analysis, we developed a method to correct for batch effects in microbiome case-control studies. When we posted the preprint on biorxiv, Greg Caporaso emailed Sean and asked him if he’d like to put our method into qiime2. I happily volunteered - I’d heard a presentation about qiime2 and was super pumped about their plugin setup, where anyone can incorporate their method into qiime’s suite of tools, and I was excited to see how doable it was. The learning curve was a little steep at first, but not as bad as I expected! Here, I’ve cleaned up my notes into a guide through my development process. I hope this is helpful to others like me, who aren’t trained computer scientists/developers, but who are keen and able to learn the programming stuff to make their tools more useful to more people. ... Read more
During my PhD, I performed a lot of exit interviews with graduating students and learned that finding a job is often the most stressful part of graduating, and among the most stressful in the entire PhD. After my own defense, however, I was able to avoid some of that stress by discovering a valuable post-PhD job option that I had rarely heard discussed: getting a part-time or consulting gig after graduating to hold you over while you figure out your next full-time non-academic career move. ... Read more
I started tracking the time I spend on various “work-related” activities near the beginning of my third year of grad school. When I started, I hadn’t yet discovered the magic of tidy data so I kept putting off analyzing the data. Now that it’s been over a year since I converted to tidy data, it’s time to dig in and see how I really use my time! ... Read more
Today, I figured out an answer to a question that I didn’t find asked anywhere on the internet. In case someone else (or me) asks this question later, I wanted to write up my solution for reference. This post goes over how to access and manipulate the right y-axis labels on a seaborn FacetGrid plot which was made with margin_titles = True. ... Read more
Slopegraphs are always introduced as being introduced by this Edward Tufte post, though this page is my top Google hit for “slopegraph.” I’m not sure if the kind of plot I’m talking about is technically a slopegraph, but in my academic circles that’s usually the term we end up settling on after a conversation that almost always sounds like, “you know, those plots which are kind of like boxplots except the paired points are connected with lines.” ... Read more
In the past year or so, I’ve become a full-fledged tidy data convert. I use pandas and seaborn for almost everything that I do, and any time I figure out a new cool groupby trick I feel like I’ve PhD-leveled up. ... Read more
Soon, Netflix will be canceling its DVD-by-mail program, the original service that helped Netflix crush Blockbuster and got us used to watching movies on-demand from the comfort of our homes before streaming was a thing. Perhaps not coincidentally, my dad cancelled my family’s subscription to the DVD service this winter. As my brother wisely put it upon hearing my dad’s news, “Netflix can finally stop buying physical DVDs now that their last customer cancelled!” ... Read more
Like many young adults our age, my partner and I did the classic pandemic move of fleeing the city and moving in with his parents. That’s how I discovered I actually really enjoy living in rural New Hampshire, and last December we officially moved in to our own place in southern New Hampshire. ... Read more
When I was on my cross-country road trip last year, I kept track of a lot of things in the hopes of doing amazing analyses when I got back. Turns out having a job as a data scientist makes it a lot harder to find time to do data science on the side, and so I’ve only really gotten a chance to look into my expenses. One of the things I had really wanted to do was make a map of all the places I went, but I didn’t actually know how to work with geospatial data in Python yet. I had this grand idea that I’d learn spatial data techniques while on my trip, but turns out hiking, drinking beer while watching the sunset, and going to bed early were way more compelling ways to spend my time. Luckily for me, one of the most fun parts of my new job has been learning geospatial coding and plotting techniques. I’ve been having fun with it in my job, and I realized that it also means I get to finally make my map! ... Read more
On my road trip, I kept track of (almost) all the money I spent. I was already fairly surprised with some of my quick calculations about how little I ended up spending (just around $4000!), and I also wanted to dive a bit more into how much I spent, where, and on what. So here we go! ... Read more
Today, I figured out an answer to a question that I didn’t find asked anywhere on the internet. In case someone else (or me) asks this question later, I wanted to write up my solution for reference. This post goes over how to access and manipulate the right y-axis labels on a seaborn FacetGrid plot which was made with margin_titles = True. ... Read more
This morning, my roommates and I were discussing our bus-taking strategies: it was around 9 am, and one of them was about to go catch the 64 bus going to Kendall/MIT whereas the other one was planning to wait for the next bus, which goes to University Park. This got us talking about which route was faster: the Kendall/MIT route, which gets you closer to campus but seems to take a longer route there, or University Park, which drops you off farther from campus but gets there more directly. I had actually meant to look into this question in my previous commute blog post, so felt this was a great opportunity to do so! ... Read more
About a year ago, I moved from my lovely Beacon Hill apartment (300 yards from the subway) to a house full of my friends (a 20-25 minute walk from the nearest subway stop). I’m super happy in my new house (we have chickens!) and it was totally the right decision, but at the time my new commute felt daunting - and many of my friends told me I’d regret giving up the convenience of my amazing Beacon Hill location. So, I did what any aspiring data scientist would do and started gathering data to prove them wrong. (See a theme in my data collection posts yet? XD) ... Read more
A couple of months ago, I was having dinner with a friend who was trying to convince me to start online dating - he’s a hopeless romantic, and perhaps the only person on this earth who genuinely enjoys it. I really dislike online dating for many reasons and we’d had this conversation many times before, so I wasn’t interested in his arguments. But as he was telling me about the new app he was using, an idea started to form… Because of the way the app is set up, I realized I could test one of my longtime hypotheses, and in the process get some much-needed validation for why online dating sucks and definitively win our debate about whether or not I should sign up. ... Read more
Once you’ve made your first qiime2 plugin, you’ll need to build it into a conda package and upload it to anaconda.org so others can easily install it. This tutorial is intended for first-time python developers trying to put their package into conda, and specifically targeted toward people developing plugins for QIIME 2. ... Read more
I started tracking the time I spend on various “work-related” activities near the beginning of my third year of grad school. When I started, I hadn’t yet discovered the magic of tidy data so I kept putting off analyzing the data. Now that it’s been over a year since I converted to tidy data, it’s time to dig in and see how I really use my time! ... Read more
Like I’ve written about before, we use our lab’s AWS account to share data with collaborators. This post explains the different options for getting access to and interacting with S3 buckets. ... Read more
As a side project from the meta-analysis, we developed a method to correct for batch effects in microbiome case-control studies. When we posted the preprint on biorxiv, Greg Caporaso emailed Sean and asked him if he’d like to put our method into qiime2. I happily volunteered - I’d heard a presentation about qiime2 and was super pumped about their plugin setup, where anyone can incorporate their method into qiime’s suite of tools, and I was excited to see how doable it was. The learning curve was a little steep at first, but not as bad as I expected! Here, I’ve cleaned up my notes into a guide through my development process. I hope this is helpful to others like me, who aren’t trained computer scientists/developers, but who are keen and able to learn the programming stuff to make their tools more useful to more people. ... Read more
Slopegraphs are always introduced as being introduced by this Edward Tufte post, though this page is my top Google hit for “slopegraph.” I’m not sure if the kind of plot I’m talking about is technically a slopegraph, but in my academic circles that’s usually the term we end up settling on after a conversation that almost always sounds like, “you know, those plots which are kind of like boxplots except the paired points are connected with lines.” ... Read more
I’ve recently become our lab’s AWS sysadmin, and my first task is to give our collaborators access to some of our data. In this post, I’ll briefly go over how I set that up and explain the different options that our collaborator has to access the data. ... Read more
In the past year or so, I’ve become a full-fledged tidy data convert. I use pandas and seaborn for almost everything that I do, and any time I figure out a new cool groupby trick I feel like I’ve PhD-leveled up. ... Read more
As part of the Microbiome Club’s outreach activities, Tu and I are teaching a group of high school girls about the microbiome next week. We’re developing a 3-day curriculum on bacteria and the human microbiome as part of the Young Leaders in STEM summer program led by the Cambridge Science Club for Girls. ... Read more
The other thing I didn’t focus on as much in the Nature Microbiology blog post was all of the lessons I learned about data and doing reproducible science. ... Read more
When I was on my cross-country road trip last year, I kept track of a lot of things in the hopes of doing amazing analyses when I got back. Turns out having a job as a data scientist makes it a lot harder to find time to do data science on the side, and so I’ve only really gotten a chance to look into my expenses. One of the things I had really wanted to do was make a map of all the places I went, but I didn’t actually know how to work with geospatial data in Python yet. I had this grand idea that I’d learn spatial data techniques while on my trip, but turns out hiking, drinking beer while watching the sunset, and going to bed early were way more compelling ways to spend my time. Luckily for me, one of the most fun parts of my new job has been learning geospatial coding and plotting techniques. I’ve been having fun with it in my job, and I realized that it also means I get to finally make my map! ... Read more
Today, I figured out an answer to a question that I didn’t find asked anywhere on the internet. In case someone else (or me) asks this question later, I wanted to write up my solution for reference. This post goes over how to access and manipulate the right y-axis labels on a seaborn FacetGrid plot which was made with margin_titles = True. ... Read more
Once you’ve made your first qiime2 plugin, you’ll need to build it into a conda package and upload it to anaconda.org so others can easily install it. This tutorial is intended for first-time python developers trying to put their package into conda, and specifically targeted toward people developing plugins for QIIME 2. ... Read more
I started tracking the time I spend on various “work-related” activities near the beginning of my third year of grad school. When I started, I hadn’t yet discovered the magic of tidy data so I kept putting off analyzing the data. Now that it’s been over a year since I converted to tidy data, it’s time to dig in and see how I really use my time! ... Read more
As a side project from the meta-analysis, we developed a method to correct for batch effects in microbiome case-control studies. When we posted the preprint on biorxiv, Greg Caporaso emailed Sean and asked him if he’d like to put our method into qiime2. I happily volunteered - I’d heard a presentation about qiime2 and was super pumped about their plugin setup, where anyone can incorporate their method into qiime’s suite of tools, and I was excited to see how doable it was. The learning curve was a little steep at first, but not as bad as I expected! Here, I’ve cleaned up my notes into a guide through my development process. I hope this is helpful to others like me, who aren’t trained computer scientists/developers, but who are keen and able to learn the programming stuff to make their tools more useful to more people. ... Read more
Slopegraphs are always introduced as being introduced by this Edward Tufte post, though this page is my top Google hit for “slopegraph.” I’m not sure if the kind of plot I’m talking about is technically a slopegraph, but in my academic circles that’s usually the term we end up settling on after a conversation that almost always sounds like, “you know, those plots which are kind of like boxplots except the paired points are connected with lines.” ... Read more
In the past year or so, I’ve become a full-fledged tidy data convert. I use pandas and seaborn for almost everything that I do, and any time I figure out a new cool groupby trick I feel like I’ve PhD-leveled up. ... Read more
As part of the Microbiome Club’s outreach activities, Tu and I are teaching a group of high school girls about the microbiome next week. We’re developing a 3-day curriculum on bacteria and the human microbiome as part of the Young Leaders in STEM summer program led by the Cambridge Science Club for Girls. ... Read more
Once you’ve made your first qiime2 plugin, you’ll need to build it into a conda package and upload it to anaconda.org so others can easily install it. This tutorial is intended for first-time python developers trying to put their package into conda, and specifically targeted toward people developing plugins for QIIME 2. ... Read more
As a side project from the meta-analysis, we developed a method to correct for batch effects in microbiome case-control studies. When we posted the preprint on biorxiv, Greg Caporaso emailed Sean and asked him if he’d like to put our method into qiime2. I happily volunteered - I’d heard a presentation about qiime2 and was super pumped about their plugin setup, where anyone can incorporate their method into qiime’s suite of tools, and I was excited to see how doable it was. The learning curve was a little steep at first, but not as bad as I expected! Here, I’ve cleaned up my notes into a guide through my development process. I hope this is helpful to others like me, who aren’t trained computer scientists/developers, but who are keen and able to learn the programming stuff to make their tools more useful to more people. ... Read more
When I was on my cross-country road trip last year, I kept track of a lot of things in the hopes of doing amazing analyses when I got back. Turns out having a job as a data scientist makes it a lot harder to find time to do data science on the side, and so I’ve only really gotten a chance to look into my expenses. One of the things I had really wanted to do was make a map of all the places I went, but I didn’t actually know how to work with geospatial data in Python yet. I had this grand idea that I’d learn spatial data techniques while on my trip, but turns out hiking, drinking beer while watching the sunset, and going to bed early were way more compelling ways to spend my time. Luckily for me, one of the most fun parts of my new job has been learning geospatial coding and plotting techniques. I’ve been having fun with it in my job, and I realized that it also means I get to finally make my map! ... Read more
I’ve gotten into the habit of live-tweeting conferences that I attend, and I really like it in part because I find it to be the best way to take notes, really helpful in lowering barriers to meeting in person, and a great way to raise your profile even if you are a lowly PhD student. Coming fresh off a conference and now that I’ve gotten pretty good at live-tweeting conferences, I wanted to share a few thoughts on tweeting scientific talks. ... Read more
When I was on my cross-country road trip last year, I kept track of a lot of things in the hopes of doing amazing analyses when I got back. Turns out having a job as a data scientist makes it a lot harder to find time to do data science on the side, and so I’ve only really gotten a chance to look into my expenses. One of the things I had really wanted to do was make a map of all the places I went, but I didn’t actually know how to work with geospatial data in Python yet. I had this grand idea that I’d learn spatial data techniques while on my trip, but turns out hiking, drinking beer while watching the sunset, and going to bed early were way more compelling ways to spend my time. Luckily for me, one of the most fun parts of my new job has been learning geospatial coding and plotting techniques. I’ve been having fun with it in my job, and I realized that it also means I get to finally make my map! ... Read more