Alm lab summit
February 7, 2019
*future me, that is
:heart_eyes_cat:
Leave bread crumbs everywhere: READMEs, comments, docs
If you died in a bus crash tomorrow, how hard would it be to pick up the pieces?
:oncoming_bus:
If your computer dies the week of your defense, how long would it take you to get back up and running?
:scream:
Grad school is a time to build skills and grow
:deciduous_tree:
In my PhD, I said yes to things that benefited me.
Learning new skills, making connections, building good favor: all of it counts!
¯\_(ツ)_/¯
Just seems like the right thing to do…
Projects
Notes and files
Data
All of my repos are basically the same structure:
├── Makefile
├── README.md <- If you don't have a README
| did you even make a repo?
|
├── data <- OTU tables (if small enough),
| QIIME 2 outputs, metadata excel
| files, trees, etc.
│
├── src <- All code: scripts, notebooks, etc.
|
└── final <- Final figures, supp files, tables.
├── data
├── raw <- Raw data in all of its
| messy glory. NEVER CHANGE!
| Raw data = outputs of processing,
| e.g. original OTU table.
├── clean <- Intermediate data that has
| been cleaned up, e.g. OTU
| table with low QC samples
| removed.
└── analysis <- Outputs from analyses (e.g.
beta diversity, p-values, etc)
Some files will probably be too large to commit: keep these backed up somewhere else!
├── src
├── data <- Code used to wrangle and
| clean data.
|── exploration <- Jumble of iPython notebooks
| with preliminary work. Label
| these by date + brief description.
├── analysis <- Scripts used to produce files
| in data/analysis/. For the most
| part, Makefile calls these.
├── figures <- Scripts to make figures.
└── util <- If you want, files with commonly
re-used functions
Iterative process between notebooks and scripts.
├── final
├── figures <- Where you save final png's,
| also pushed to GitHub if you want.
|── tables <- If you're feeling ambitious,
| markdown versions of tables
└── supp_files <- Files that would otherwise be
supplementary Excel files
Mostly for you to organize outputs.
Read more at Cookie Cutter data science:
Projects
Notes and files
Data
Anything “messy” starts with a date
Use delimiters creatively
grep
is your best friend
Also, everything is on the cloud.
(Remember the potential :scream:)
ProjectsNotes and files
Data
All data folders should have associated README: who, what, when, why, how?
Google drive and Dropbox are dangerous: who did what when?
If files are small enough: version control with github
Otherwise, keep versions … somehow?
(I haven’t figured a great system for this one out yet)
Makefiles
Tidy data
Implementing these two concepts changed my life
To make a target
, run the rule
iff any of the dependencies
are newer than the target.
target: dependencies
rule
make figure3.png
figure3.png: figure3.py disease_meta.txt core_bugs.txt
python figure3.py --in_meta disease_meta.txt ...
disease_meta.txt: disease_meta.py qvalues.txt
python disease_meta.py --qvals qvalues.txt \
--out disease_meta.txt
qvalues.txt: qvalues.py otu.clean meta.clean
python src/analysis/qvalues.py \
--otu otu.clean --meta meta.clean ...
zomg reviewer comments zomg
The code is the documentation of what you did.
(Remember: make future you love current you. :heart_eyes_cat: )
Literally life-changing.
:panda_face: + = :nerd: :mortar_board:
Tidyfied OTU table:
otu_id sample_id counts
otu1 s1 0.0
otu1 s2 16.0
otu1 s3 0.0
... ... ...
otu2 s1 1.0
otu2 s2 0.0
otu2 s3 20.0
... ... ...
Query subsets of data
Merge data
Harness seaborn
Just trust me (and Nathaniel (and the #Rstats internet!))
Claire Duvallet