Tuesday, August 25, 2015

Measuring my uBiome diversity

While there are few agreed-upon standards for what constitutes a good or bad microbiome, nearly everyone agrees that a diverse microbiome is better than one that is less diverse. How can we measure diversity?

Ecologists have been interested in this question for a long time and they’ve developed several metrics to describe how diverse a particular ecosystem is compared to another. This is my very tentative first step to try to adopt those metrics to my Ubiome results.

You could simply count all the different species (or genera or phyla) in a sample and track that over time. The more unique species, the more diversity. Here’s what that looks like for me:

Diversity Through Time

In this case, all I did was plot the raw number of taxa that uBiome found at each tax_rank. That’s about 60 species in my latest sample.  But since 16S technology doesn’t capture all the species information, or the genera or phyla information for that matter, a simple count of the number of organisms detected is not terribly useful. In my case, uBiome identified between 90-97% of all the phyla in my samples, but only between 49-51% of the species. That makes apples-to-apples comparisons difficult to interpret. 

Ecologists have suffered from this problem for a long time, and they came up with a few metrics to get around the it. They start by considering what it means to say something is more diverse than another. Consider a forest that has 1,000 trees in it. If all 1,000 trees are, say, aspen trees, then that forest is not as diverse as another one that might also have 1,000 individual trees but, say, 1,000 unique species.


There’s a unit of information called the Shannon number, after the information theorist Claude Shannon, who was the first mathematician to systematically try to measure information. To Shannon, whose work was concerned with code breaking in World War II, a radio signal that carries information (i.e. a code) will be slightly different from one that is random noise. He applied a specific formula to tell how different a signal looks compared to random noise, a variation of which can be applied to an ecosystem to tell how different it is from one that is completely dead (0) or has nothing but the same or similar organisms.


I use a slightly more ecologically interesting version of the Shannon number, called the Inverse Simpson number, that looks at the total number of unique life forms in an ecosystem and then weights each by the number of individuals of that type of species. Conveniently, these functions are all available in the R “Vegan” package.  Here’s how I set up my R environment to do the calculations:

allSprague <- read.csv("spragueResultsThruJun2015.csv") 
allGenus <- allSprague[allSprague$tax_rank=="genus",]
allSamples <- allGenus[,-(1:2)]
dV <- sapply(allSamples,fisher.alpha) 


Here is my diversity as an Inverse Shannon number:

#genus diversity
dG <- sapply(allGenus[,-(1:2)],diversity,index="invsimpson")
plot(dG~dPDates,main="Genus Diversity",xlab="",ylab="Inv Simpson Diversity",xaxt="n")


What does it mean?  Apparently, my overall Genus diversity has declined in the past year, though it bumps up and down so much that it’s hard to see a real pattern with so few data points.  It’ll be interesting to compare my diversity to a few other people using this function.

My apologies for the super-technical nature of this post, especially with the un-cleaned R source code. I’m just throwing it on the blog so I discuss it with people who are way more knowledgable than I am and can hopefully guide me to something better. I’ll have much more to say later, after I actually understand what I’m talking about.