Mapping the Epigenetic Basis of Kidney Disease – Katalin Susztak

Mapping the Epigenetic Basis of Kidney Disease – Katalin Susztak


Katalin Susztak:
Thank you. So, thank the organizers for inviting me, and I am fully aware of the fact that
my talk which is going to be half-an-hour exactly is between you and getting lunch,
and I know it’s been a long session, so I want to tell a little bit about the work
that we have been doing over the last five years. My lab was part of the Roadmap Project,
so if you’ve been working on this just a way of an introduction. I’m actually a physician
scientist, just like the first speaker, so I am a nephrologist. So, what I do on a daily
basis, I have patients who are on dialysis, so we have about half a million patients in
the United States. And they spend an excessive amount of time over there, so it’s four
hours three times a week, and it’s not the best to have it. So, in general just a way of introduction
to the kidney, the kidney is basically a, you know, a [unintelligible] organ, right,
and on a microscopic basis it consists of — I think you can see this — so it consists
of the structure which is called glomerulus, where you just basically filter your blood.
It’s actually you filter a lot, so it’s about 100 cc per minute. So, you filter about
one coffee every two minutes, and then you — as you probably noticed you don’t pee
out 10 buckets, actually 18 buckets of water every day, so that’s because you have these
long and convoluted and different parts of a tubal system which basically reabsorbs the
water and electrolytes, and then there’s some form of a secretion, so the function
of the kidney is measured by the filtering function of this glomerulus, and this is a
fibroid kidney, so in kidney disease that we study and of course advanced adrenal disease
is basically you get this scarring of the organ where you lose the epithelial cells
and then the glomerulus as well. So, the function is measured how much you filter and nephrologists
have really simple people so you filter 100 cc per minute. You know we like that round
number 100. That’s having measured it, so I know you — many of the people have this
notion that, well why to care about kidney disease? You have dialysis and transportation.
Indeed we do have it, but I just want to tell you that if you have end-stage kidney disease
and you are on dialysis, you have about 20, 25 percent chance of living through five years,
and that’s just a little bit better than getting lung cancer of AML and then it’s
actually largely worse than many of the common cancers, and just a way of putting renal cancer
on this bar — on this graph as well, so actually the survivor of [unintelligible] cancer is
slightly better than being on dialysis, so it’s not a trivial problem, and also it
costs about $30 billion a year which is actually 10 percent of the Medicare budget despite
these patients actually I think consists of only 1 percent of the total population of
it. So, it’s quite costly. You do better if you get a transplant, but very few people
are able to get a transplant. So, why do people develop kidney disease and
how can we solve it, and that’s what my lab is trying to understand. So, as Nancy
Cox kind of introduced us. It’s a complex trait. We have a contribution — a genetic
contribution and then we have these numbers for hereditability and you can see that this
reaches .3 to .7. Right now, we believe the hereditability of GFR amount in Europeans
is somewhere around .3 to .7 comes in for African Americans for end-stage kidney disease,
and I’m going to show an example of what could explain that actually very high hereditability,
and then a bunch of environmental factors, aging. That’s why most people — it contributes
very strongly for kidney disease development, diabetes and smoking. And then, here you are
with kidney disease. So, how to understand the genetics of kidney
disease; we have GWAS and I think people have kind of talked about this quite extensively.
This is the data for the most updated GWAS paper from CKDGen that my lab collaborates
quite significantly and there is a new one in the pipeline. This one has about 67,000
participants in it, and the new one is going to have about — more than 100,000 cases of
European descent and what you see here that some of the low side of these come out to
be significance and then we were able to increase the significance and I actually don’t know
how many on this part — this graph, but right now we have about 67 curated loci that we
work on that has shown reproducible association in people, the European descent in chronic
kidney disease development. I will talk a little bit about this top locus
over here on chromosome 16, and as you know, you know we all love geneticists — we already
gave a name, so we don’t have nothing to do with — after that they know what the genes
that cause kidney disease indeed as it was explained in the very beginning. We really
don’t know whether these are the actual genes that underlie their association or causes
related to disease development. So, as for many other traits, for kidney disease also
these SNPs are in the non-coding area of the genome, so 80 percent are non-coding and then
we have the questions that have been discussed before that; how do these SNPs actually lead
to kidney disease development? So we would just like to know which one is the causal
SNP, which one is the target cell type. Really because I’m a cell biologist mostly so we
really would like to know the target genes, and then maybe the mode of this regulation
would not be as bad as well. So, what my lab — so this is the framework,
the way I — we think about it and then I think many of the people in the [unintelligible]
thinks about this of how we could understand and make sense of these GWAS. So, we think
that this causal variance somehow localized the regulatory region and disease relevance
cell type. I’m going to give data and there are papers from John Stem and Brad Bernstein
also looking at the kidney associated traits that we believe that actually these cell types
somewhere localized in the kidney. It’s not really an [unintelligible] phenotype.
That’s what we talked about already as well. So, the variant should alter the target gene,
especially in this disease relevant cell type via most likely altering transcription factor
binding although we could maybe accept other mechanisms. What we add to this is that we believe that
the target expression — the target should be expressed in the kidney, and then we also
think that the target expression should change in disease states, and then we would like
to have a correlation how the genotype and the disease states changes the target expression,
so if the risk allele increases the target expression, we hope that we find the same
kind of correlation if you look at samples from patients with chronic kidney disease.
And obviously the target expression should somehow cause kidney disease and therefore
should be functional, so I will go through a couple of examples, so the first one is
that this should be localized in the regulatory region in the kidney. So, to understand that, my lab physically
started to develop this fairly large kidney bank, so we have more than 1,000 samples at
the moment, 1,200 on the last count and then what we have here is slightly similar for
other GWAS data, so this is actually updated with clinical data in real time, so mostly
these are collected for unaffected part of tumor nephrectomies and those patients disease
— kidney disease incidences fairly high, 20 percent of them and since the common condition
is called kidney disease is diabetes, hypertension so these are actually quite highly prevalent
conditions in people who are getting nephrectomies, who are, you know, the usual 58 year old males
or females, and what we have built in is this data is updated itself, so we have not just
the static clinical update, but it updates over the years as — so we have information
for functional decline. We have done a fairly detailed histopathological
examination, which is not just like whether you have a disease or you don’t have disease,
but we use many parameters that are — we hope to use as maybe as endophenotypes as
we score different things that people under the microscopes can score off of the differentiation
of epithelial cells, the scarring, the inflammatory cells, and so on just by visually looking,
so we have large efforts to do transcriptome analysis, and I think we are about 500 samples
that we have done already. And because I’ve told you that there are two different segments
in the kidney, one is this glomerulus which is the filter and the tubules that kind of
process the filtrate, so these — we micro-dissect all sample to glomeruli and tubules. We have
epigenome analysis, mostly methylation, and we are working on what I will show later to
isolate different cell types out of the kidney and make ChIP-seq base chromatin annotation
for them and then we have genotype all the samples that we have processed using biobank
because it is much cheaper and then obviously we tried to integrate all of that together
to figure out what’s causing kidney disease, so the causal variance should be somewhere
in the kidney, so to do that we get this kind of organ transplant of kidneys where we use
just the kidney cortex itself or we separate different cell types out of it, and using
the end-code based chromatin — I mean ChIP-seq marks, the H3K27 acetylation and then K4 monomethylation
as an enhancer marks and K4 trimethylation as promoters and K36 for methylation is as
transcribed regions of two annotate regions in different cell types. So, now if you look at the SNPs, so we could
look at in the kidney, so this is just a so-called adult kidney of what you find is — what we
find is — and that’s fairly similar. What’s published is that a large percentage of the
SNPs of the six or seven of the locus I’d actually localized the enhancers, so this
actually — there are several ways to do this — this is mapping just the leading SNP that
is published in the paper, and then we can kind of enhance this to about 65 percent if
you take all the tagging SNPs in the LD block and then you accept that if one of the LDs
actually in an enhancer, then you call it as an enhancer, but not more than that for
the kidney, and that’s — there is a significant enrichment if you compare it to like a H1
stem cell and the fibroblast and this is actually ENCODE data, and then we looked at multiple
ENCODE cell type, so indicating that kidney disease associated polymorphisms are localized
to enhance the region in the kidney. So now we can do a little bit better than
that because we have now these multiple cell types that we make out of the kidney, and
then we make the maps for these cell types as well, and then we can also say that this
is actually not just somewhere in the kidney, but maybe in some enrichment. Although, I
would take this with a grain of salt, but you see an enrichment that it is somewhere
in the tubule epithelial from all the places when we compare it to other cell types that’s
in the kidney of glomerular epithelial cells and epithelial fibroblasts in mesangial cells,
that seems to be the cell type where we see kind of more clustering of these CKD associated
polymorphisms. So, that’s very nice, but that computational, and then obviously my
lab is very interested in the mechanism, so we have to actually do the hard work so we
have to screen through these enhancers and then show that they are actually localized
and then to act as a regulatory region in the kidney, so to do that, we actually use
the zebra fish system and this very nice reporter system where you have an mCherry flying by
two Tol2 sites, and then you can do large-scale cloning into it which we got via [unintelligible]
Fisher who has helped us quite a bit. So we clone all these, so computative [sic] enhancers
over here and then we use a fish where we have — it’s a transgenic fish where we
labeled the tubule, so the zebra fish has actually just one filter by two little tubes
on the side, so we label this with green and therefore if the clone in the mCherry, we
could see that whether it’s in the — you could screen very efficiently whether you
see that. So, here it is in real life, so this is the
tube which is green and this is the mCherry of this — this is actually that chromosome
16 locus which we are working on dissecting which had the highest peak on the GWAS and
then we are dissecting into multiple regions, and you see that that actually localizes again
to the tubules, so the histone-based attestation and now a validation coincides that both of
them — this region, somewhere in this region is able to drive expression to the kidney,
so it’s a kidney specific regulatory element. So, that’s very nice. The question is obviously
which — because we are somewhat biology based is, what are the target genes of these variants,
so this is nice that it’s in regulatory region, but you know, what are the target
transcripts? And to do that we toyed a little bit with in vitro transfectional luciferase
[unintelligible] looking at them that many of these genes actually are putative targets
and not expressed in these cell lines that we can easily transfect, so we mostly use,
looking at — working through — using eQTLs which have been introduced before, so basically
you’re looking at the genetic variations and the transcript expression, and then so
we have — because we have a lot of kidneys that are genotype and we have transcript level
data, then we can use now a kidney specific data to annotate the variance, so depending
on the genotype, you see variation in gene expression. So, this is a result — so this is 100 of
the kidneys that we have because this is more of a homogeneous CU decent. We feel that’s
important and then you find, you know, large number of so called E genes that are genes
that are SNPs that are associated with transcript level changes in the kidney, so just to probably
— I should have introduced that, that some had the kidneys left out of all these big
efforts, so GTEx is not very good at collecting kidneys and that big science paper that just
came out, they had three kidneys. Although, I have to say that they made a major conclusion
out of it that I’m not 100 percent sure, and I think kidney is being transplanted,
so it’s hard to collect them, so I think it’s a quite useful, unique resource, and
also in Roadmap John and Brad Bernstein had some kidney data here and there, but it really
was not well represented even in the Roadmap data and it’s not really part of really
ENCODE, so maybe in a way of advertising should be included and so I feel that these efforts
are actually quite important. So, we have a number of E genes which is quite
consistent of what GTEx is finding and that many of them are — seem to be quote, quote,
shared genes, but one-third of these is shared what’s not published in GTEx, so this is
the CCQ2 [unintelligible], so with 100 samples we cannot really do trans, so this is the
SNP location, this is the transcript location and each spot is represented here if that
SNP is significantly regulate the target gene expression, and in real life it looks like
this. This is I think one of the best eQTL-plus that we have, so this particular variant which
could be C/C, or C/T, and T/T, and then you see that this solid carriers, you know, the
tubules are mainly — you know, express high number of salt carriers, because that’s
what it’s function; it has to reabsorb salt and water, and you see that this variance
has a very nice strong effect on the transcript level of this particular salt carrier. And then this is another one. I showed this
because this being proposed by the CKDGen consortium. And they did functional studies
indicating that this variance actually influences the level of this gene. They did not have
eQTL data in the paper; what they did is they did a morpholino-based knock-down of this
gene and that showed a phenotype, but indeed looking at the eQTL now, this affect is not
as great as this one. I guarantee you, but there is an association between the genotype
of this and the target gene of this, and that seems to validate what is inside there. So doing this obviously you can see very small
fraction of overlap with the CKD GWAS hits and what you could do you could obviously
you can just look at the GWAS SNPs whether you can find an association for any type of
target gene. So to be very transparent, right now I think we have three or four where we
have good statistical significance and then hopefully we will have more maybe by dissection,
or other matters that we are doing. Just in a way of introducing, indeed these E SNPs
are enriched and they are more an enhancer and specifically this is an overlap of the
tubal cell line H3K4 monomethylation and the E SNP location and this is — E SNP is out
control SNPs, and you see an enrichment, and that is not there if you use other type of
regulatory marks and then actually this is also not there if you are looking at other
cell types, and thinking so this glomerulus epithelial cells and mesangial cells, so again
somehow indicating that the tubal epithelial cells may be the important cell type for the
kidney and [unintelligible] development. So, I’m going to show you an example of
that. So, this is that [unintelligible] chromosome 16 and what you see is this is the SNPs that
are showing the highest significance and then these are the genes under here similarly that
have been shown previously by the other speakers, and well, you probably saw the first plot
on the disk. It is something called UMOD. UMOD has a urinary gene, has a name urine
in it. So, it has something to do with the kidney, so that’s why this spot is actually
— was labeled with a big sign UMOD in the kidney and that’s believed to be — this
SNP is actually — seems to increase the expression of this gene by some studies, and what we
know that the gene expression actually decreases in disease development. So, the SNP should
increase the expression of this gene, but in disease the gene expression goes down.
So, if we look at this locus again because now we have eQTL data, but you see it is actually
quite broader, so there are couple of other genes around it as well. So, this is the locus again, so these are
the SNPs here. This is that UMOD. These are the other genes over here, and then here is
how the eQTL looks. So, this is the transcript expression of the UMOD genes. There is a little
trend for increased expression, what has been described in the literature, but it didn’t
reach statistical significance in our data. Then you’re looking at the next gene over
here, which is actually a gene family, ACSM, something to do with acyl-CoA medium-chain.
I really — it’s not really well annotated in the literature, but there are five of them,
and they are right here together. And this one did not show a change, but this one if
you look at it, there is a very nice change between the genotype and an expression of
this gene and actually there are — PKM values for this gene is fairly decent showing as
an E gene. This one did not, and this one again shows some association and here is not
as nice as for this one, and expression of this gene is actually much lower, so indicating
that for us when we look at this SNP, it was associated that this gene as a target gene,
now, maybe one gene away is where we find the significant effect on gene expression. So, we included two additional cordelia that
the target should be expressed in the disease-relevant tissue in the kidney so this is actually an
Illumina body data RNA-seq data, and what you see is the expression of these genes of
that area in the kidney. What you see is this gene UMOD that’s proposed to be — is highly
expressed, but our target is also fairly nicely expressed in the kidney. Maybe some expression
in the liver, but it’s indeed it is very nicely expressed, and then if you look at
the protein expression, indeed, again, it’s fairly nicely expressed in the kidney as well.
Now, we also added that target expression should change in kidney disease development.
So, because we have a 1,000 samples, we can actually look at the correlation of the gene
and kidney function because that’s a kidney function [unintelligible] changes, so going
from 100 to zero, you still see that there is quite nice R square and correlation, and
then that’s not just RNA expression, but you can pick random samples from the top and
on the bottom and then the protein expression correlate with disease development as well. So, alteration of the target can cause kidney
disease, so the target should be functional in the kidney. So, for this again we use the
zebra fish system, and the morpholino knock-down. So, as I discussed the function of the kidney
is to get rid of salt and water. If the kidney doesn’t function, you don’t get rid of
salt and water and that’s represented in the fish as having an edema, so they puff
up and then they have a lot of — it’s probably called [unintelligible], so they have salt
and water in excess. And, that’s what you see if you knock down the orthologue of this
Acsm gene in zebra fish. So, in kind of — and that’s kind of the proposed function of
this Acsm is something to do with acyl-CoA and fatty acid metabolism, somewhere not much
known in the literature. So, in conclusion, so we have this Roadmap
to understand GWAS associated hit. I think human tissue samples and especially large
number of human tissue samples are really critical to get to this; we used the epigenome
maps to identify regulatory regions, model organisms to validate the causal variance,
eQTL maps for target gene identification, and then we look at — in addition to that
we also look at the correlation of the genes, the kidney function because we feel that should
also be present, and then use model organisms, and the zebra fish seems to be a fairly quick
screening tool to figure this out, and then I showed you this out of the three that we
have as a hit, but mainly this is limited by the eQTLs because right now these identify,
I think just very few variance with significant affect because our sample size is small. And
a couple of other issues that’s — so that — and the gene; maybe that has to do something
with fatty acid metabolism. I don’t know how I am about time, but I
have a few other things that I wanted to share, so I will go through that quickly. So, you
know that the SNPs actually explain 2 percent of the hereditability and then we have about
30 to 70 percent, so what about the others? So these variants you know explain very little.
So, where is the missing hereditability, and then there are several things to think about
this: more samples, deeper sequencing, ethnic groups, and epigenetics. I will show you an
example for two of these. One is I think is absolutely tangential to the meeting, but
I think it’s a beautiful example of genetics, so I cannot skip that, so — and that’s
about different ethnic groups. So, the first slide that I showed you GWAS was Europeans
and then you have the 67 regions, each of them adding together maybe explaining 2 percent
of hereditability. Now, if you do the same, a mixture study in a black population for
kidney disease, you get this one and only beautiful, big hit on chromosome 22, one hit,
and that turns out to be a variant, a coding region variance in a gene called ApoL1, so
that’s very, very rare for any kind of complex trait, and that turns out to be that there
was, as evolutionary pressure to maintain that coding region variant because that variant
protects people from trypanosomiasis, which is the African sleeping sickness. So, I guess shows similarities to malaria
and sickle cell, so this is the same exact story. The heterozygote form of this variant
protects you from trypanosome and then this is the lysis of the trypanosome by this G1
variant, but if you have two copies of this variant, you get kidney disease and then [unintelligible]
ratios for kidney disease is not insignificant, go from two to 100x and if you actually get
HIV on top of getting this variance, it’s almost like sure to develop this disease with
this two alleles. So, just in a way of that, so we — my lab contributed to this by making
a mouse model for the variant, and indeed if we produce variants into specific cell
type in the kidney which is these glomerulus epithelial cells, you get disease development.
So, indicating that indeed this coding region variance is disease causing, so that’s one
way of finding those rare variants with large affect size going into a different population,
but as part of the roadmap for five years we were looking at whether epigenetic differences
could explain this missing heritability. So, this is actually — just this part of my talk
is pretty much published so if we looked at samples of 100 micro-dissected human patient
samples, kidney samples with different conditions of kidney disease, and then this is what [unintelligible]
dissected, and then we looked at changes in this tubular epithelial cells that we micro-dissected
from patients samples of 100 kidneys, and with the genome via methylation analysis using
a method — I would say it’s a — something like an MRE-Chi like a methylation-sensitive
— [unintelligible] digestion was developed by John Greally at Einstein, and of course
this Illumina 40 to 50 arrays. And what we find is that indeed you can identify this
epigenetic changes in healthy and disease kidneys that are able to cluster normal and
disease samples quite nicely and separately, and if you look at validation cohort, again,
you’ll see that these methylation differences cluster and different in control samples and
disease samples, but I just would like to show some of the other things. So, we got fantastic P values with even fairly
small samples, but what you see is the difference in methylation differences in absolute values
scale is small, so what you see in kidney disease, and I think I see that in multiple
other disease conditions. There are changes, there are very consistent changes; we can
replicate it in different samples the same changes, but the absolute difference in methylation
level is fairly small, unlike in cancer when you can see a difference going from zero methylation
to 100 percent methylation, these methylation differences are small and of course the future
should tell whether they are actually significant going through that route. We looked at whether
these methylation differences are randomly distributed to the genome or they are maybe
on promoters. There is a lot of data on promoter methylation differences influencing gene expression,
but when we looked at by [unintelligible] mapping, these differentiated mapping regions
were depleted on promoter regions. We could hardly find any [unintelligible] difference
in a promoter, and when we looked at by [unintelligible] mapping, these differentially methylated regions
were depleted on promoter regions. We could hardly find any methylation difference in
a promoter, and when we looked at — by ChIP-seq base annotation where they are, they were
actually on enhancers, and they were on kidney specific enhancers when we were able to — we
looked at the nine ENCODE cell lines again. So, these are small differences on enhancers;
therefore, we could look at with that they could potentially influence transcription
factor binding, so we looked at the same computational analysis, and we find that they influence
several transcription factors. One of them was for example, SIX2, and we found a bunch
of others, and then I — very few of them are nephrologists, probably, in the audience,
but this is actually a very important kidney development or transcription factor, so is
these two others. So, it seems that there was some sort of an enrichment on these enhancers
that they can computationally bind kidney specific developmental transcription factors
over here. Now, looking at the other way of whether these
differential methylation is actually functional, we looked at gene expression by mapping them
to the nearby genes, and indeed we find correlation between differential methylations and transcript
level differences, so maybe these differential methylations actually drive gene expression,
and if they drive gene expression maybe there are of course important in disease development,
so we have some of like — about 40 percent of them were correlating with gene expression
and this is going to be my last slide. And, they were also again enriched for developmental
processes. The same you find it when you do enhancers for H3K4 monomethylation; again,
they are in enriched for developmental processes. So, that correlates with some of the data
and the literature that kidney disease maybe developmentally programmed. This is a slide
I borrowed from Francine Einstein from Einstein, so if you feed rats on a controlled diets
and look at the pups, versus if you feed rats in a calorie restricted diet, then you look
at these pups and you see is that these pup with a calorie restricted diet developed one
measure of kidney disease which [unintelligible] in there and that correlates the differences
in their epigenome and cytosine methylation levels, indicating that maybe indeed they
are programmed somewhere early on. So, this second set of conclusion is that
you find small, but highly consistent cytosine methylation changes in kidney disease. Tubal
samples they’re isolated, the methylation changes are enriched on kidney specific enhancers,
and then they are enriched on fibrosis and developmental genes are affected more commonly
and maybe that’s consistent that somehow this kidney disease has some sort of developmental
origin which is being proposed in the literature in the past, and I would like to say that
most of the work has been done by really talented graduate students, Yi-An Ko; she will be here
tomorrow, and Huigang Yi, who is an informatics person in the lab, and the second half of
the project is published and that was part of this Roadmap epigenomics project and we
have lots of collaborators who helped us with the GWAS studies or eQTL analysis and many
of the other work we have been doing. Thanks so much. [applause] [end of transcript]

Leave a Reply

Your email address will not be published. Required fields are marked *