The newly released IEEE 2791-2020 Standard for Bioinformatics Analyses Generated by High-Throughput Sequencing (HTS) facilitates an expedited bioinformatics communications exchange protocol amongst critical stakeholders in the clinical research phase.
Maria Palombini, Director of Emerging Communities & Opportunities Development and Healthcare & Life Sciences (HLS) Practice Lead at the IEEE Standards Association (IEEE SA), interviews Dr. Raja Mazumder and Jonathon Keeney to shine a spotlight on the many different use cases from submissions for FDA regulatory review, to potential COVID-19 antibody research and other vaccine and infectious disease development applications.
Re-Think Health Podcast Series is part of the IEEE SA Voice program. IEEE SA Voice shares insights and perspectives from the IEEE SA community, subject matter experts, and industry leaders that are working to raise the world’s standards, drive market solutions, and much more, keeping you at the forefront of technological innovation for the benefit of humanity.
About This Episode’s Guests
As a Biochemistry and Molecular Medicine professor and co-director of The McCormick Genomic & Proteomic Center at The George Washington University (GW) and while working at National Center for Biotechnology Information (NCBI) at NIH, and UniProt Dr. Mazumder has worked closely with national and international colleagues in developing international molecular biology resources and using these resources to identify therapeutics, diagnostics and vaccines targets. His research focus is on developing novel methods for data-to-knowledge discovery through national and international initiatives in biomedical sciences such as GlyGen and OncoMX, and community driven bioinformatics projects such as the BioCompute initiative. He has experience in scientific coordination, bioinformatics infrastructure building, and through NCI, NSF, NIGMS, NIAD, pharmaceutical, non-profit and FDA funding he has been involved in genomic and bioinformatics research associated with cancer biology, glycobiology, and metagenomics. Dr. Mazumder is also the co-developer of High Performance Integrated Virtual Environment (HIVE) which is approved for use in a regulatory environment at US FDA. In addition to his research activities he mentors faculty, graduate students, and directs the Bioinformatics M.S. graduate program track and co-directs the Ph.D. Bioinformatics and Genomics program at GW.
Jonathon Keeney is a Research Assistant Professor, Managing Director for the Executive Steering Committee for the BioCompute Public Private Partnership, and lead for the High-performance Integrated Virtual Environment (HIVE) bioinformatics platform. He was Secretary for IEEE 2791-2020, the BioCompute standard, which is now a published standard adopted by the FDA and others. His work develops novel approaches to research questions that comprise both strategy and bioinformatic framework, and which have included neuroscience, microbiome, and virus research. He has contributed heavily to the development of the BioCompute standard for the communication of computational analyses, genomic copy number variation in the developing human brain, and strategies for adventitious virus detection.
Follow Jonathon Keeney on LinkedIn.
Hello everyone. I’m Maria Palombini with the IEEE Standards Association. I lead the Healthcare and Life Sciences practice. So much is changing in the world of health. We have new technologies, tools, and applications, all of which should make us think: how can we rethink the approach to health, so that we as patients, you and me, end up with better health.
I’m very excited to have with me the chairs of IEEE 2791 Raja Mazumder and Jonathon Keeney. Just for all of you who don’t know what P2791 is, we just actually released that standard last month, very fresh off the presses. And it is the IEEE standard for bioinformatics analysis generated by high throughput sequencing to facilitate communication. Yes, it is quite a mouthful, but very important, very cutting edge application of this technology.
And you will see, it is important today even in the middle of this global pandemic. Just to let you know, Raja is a professor of biochemistry and molecular medicine and co-director of the McCormick Genomic & Proteomic Center at the George Washington University. And Jonathon is an assistant research professor in bioinformatics department of biochemistry and molecular medicine also at the George Washington University. And he’s a member of the executive steering committee for biocompute. Let’s get to the great stuff, Raja. We hear genome sequencing seems to be talk within the science community about solving the position medicine puzzle. We know genomes generate great insight, but a lot of it in many different places, therefore, what exactly is biocompute and how will it help address this growing challenge?
Thank you Maria, and that’s a great question. Genome sequencing, in my mind, has revolutionized the way we do biomedical research. We do biomedical research worldwide. When sequencing started happening, it became really easy to generate a lot of data and analyze it. Now, the problem that happened was that the data, but it was not being well documented. So biocompute helps organize this information in a way that is human and machine readable.
I would add to that. I think you’re right, that genome research can really generate really great insights, but exactly how sometimes may not be clear. For example, the degree of variation in some sort of certain spot in the genome and whether or not these contribute to disease or just normal variance is sort of an ongoing question. For example, in genome research is still fairly new. So there’s still thousands of questions like this. And the way that genome data gets turned into useful information depends on the question that’s being asked and the way that the researcher is asking that question and the way that they’re trying to answer it. And so, because of all that variability, it can be very hard to follow what someone did. And so there’s, there’s been a real need for some way of communicating that information in a clear and articulate way.
And some labs have tried to standardize the way that they report that sort of information on their own. And that’s really great, but it’s often specific to the way that they do things and not widely adopted. And so biocompute has been great because it’s, it really abstracts away the process of a computational analysis from any specific way of doing things. It doesn’t matter which software you use or which platform or which strategy and so on. You keep doing things exactly the way that you have been. And because there’s been such a big community investment in building the standard that will help meet the needs of the most number of people. So different groups will know exactly what to expect when communicating their work to each other.
Fascinating. I know I’ve talked to your colleague Dr. Vahan Simonyan many times, and he tells me, we’re just starting to scratch the surface of the amount of data we can get from human genomes alone. And he said that a year ago and it really holds true. This is fascinating work. So, Jonathon you know, we know that biocompute is a public private partnership and we kind of would like to know how it came about with the FDA and, you know, besides university and the FDA who are some other partners involved and more or less like, what are the motivating factors to join forces? I’m sure this was not born overnight.
No, it wasn’t. Well, there’s a big difference in a standard in the application of that standard. Maybe you could standardize a cigarette later for your car and they have the same dimensions and the same power and the same safety, et cetera. And then people go and start using it to charge their electronic devices. So, you know, there’s a big difference between the way that something is standardized and the application of that standard. And so the public private partnership is meant to tackle issues like that. It’s a great vehicle for a federal agency that’s considering using the standard as a means to communicate next generation sequencing information to them.
So they say, you know, this is what we’re considering doing as a means to apply that standard. And the partnership will facilitate development evolution and use of the standard. For example, that could be in terms of joint projects with a common goal or formal integration of the standard into institutions or building extension domains that have their own consensus. One of the cool things about biocompute is that it’s got this, this user defined extension domain. So biocompute will work for probably 99% of all use cases because it was built on this consensus, but there’s going to be some specific applications where it may not. And so in order to deal with that, there’s this extension domain built in so that different groups can kind of modify it in their own way. The partnership can kind of help my building, some of those and things like that. The partnership is actually brand new, as you mentioned, the standard just published. And we’re recruiting for it right now.
In addition to that the partners that we have been working with for now several years will contribute it to the development of the standard, our industry partners pharmaceutical companies, bioinformatics platform companies, and so on, and also academic institutions. So a standard is only useful when different groups use it to communicate with each other also. For example, academic institution may develop this really amazing protocol for detecting viruses or detecting Rapids mutation and viruses. And then industrial partner can then take it to the next level and make the product a diagnostic product. And then they submit the product it’s a whole ecosystem. And at every step, there could be dozens, maybe sometimes more than that, people involved in developing this product. And it’s critical that when something like this is being developed, that every step of the process is correctly recorded and also standardized and biocompute helps achieve that. So it’s the whole process from all the way from the bench all the way until it reaches the bedside biocompute actually has a very important role to play. Last year we did publish a paper in plus biology with several of our collaborators on how biocompute helps precision medicine.
I can totally see that Raja. I mean, I’m so glad that you highlighted the point because there are not many standards out there that I’m aware of that can really give you the gamut from bench to bedside. And this is one of the unique applications of the biocompute standard, you know and the other interesting about this Jonathon, was that I noticed you guys had a heavy focus on making it an open source standard. Do you want to maybe just explain a little bit why there was such a commitment or dedication to that concept?
Yeah, sure. I think there’s a couple of answers to that. One is that we’ve taken great pains to make the entire process adhere to what’s called the fair standards. Findable, accessible, interoperable and reusable. And so this is a big part of that effort. And that’s a very big effort in academia right now. And in research generally to try and make research that conforms to that fair standard. The other thing is, you know, like I said, individual labs have standardized the way that they’ve done things and it’s great, but the real power of language of communication, like this is when lots and lots of people use it.
And we really needed to go through a formal standardization process. It’s well recognized and has a far reach, but we wanted to do it in a way that still empowers the individual researchers who are very independent minded. And having an open, open source repository allows different groups to build off of it in their own ways that might be integrated into their own systems. You know, so for example, if there’s a private company that has some sort of proprietary process that they don’t want to expose, and they there’s something about it, they want to keep to themselves, but they still want to build an internal way of handling that that is compatible with biocompute, we’ve made it very easy to do that, so they can fork off a branch of the repository and kind of build off of it in their own way.
Excellent. P2791 is actually one of the first projects of the IEEE SA open source program. Everything just fell into the right place. Speaking of that Raja. So we’re focusing on the standard, the publishing of the standard, but there’s a little bit more to biocompute in the full suite of opportunities and services that it can provide just beyond the actual standard. Maybe we could talk a little bit about them and how they actually all work together and help in the entire process.
Yes. So using the biocompute, creating a biocompute, reading a biocompute and so on to get there. We do realize that sometimes it makes sense to have demos in our training. Right now, for example, we are providing treaty to FDA regulatory scientists on how to evaluate and use biocompute and these types of framings, you know, we are also recording and we are going to make them available through biocomputeobject.org and other places, really at a level where people can look at the reporting YouTube video kind of things. But on top of that, there is another thing that we have already started doing, which is registering the domain space for biocompute. For example, if you are a pharmaceutical company and you’re a big company and you have multiple products, multiple groups working on many, many projects, and you want to register a particular space for the view, which means, let’s say you are company X, Y, Z. All your biocompute starts with XYZ. So we have a mechanism in place which allows institutes companies, whatever have you to register their succession space within our compute object registry.
So this will allow them to be, to have unique identifiers for their biocompute objects as they go along so that they can refer to it when they’re submitting something or submitting some research work to a journal or, or even for their own in house lab note. This is really important. You also have ways where our mechanisms for people who do not have the resources to create their own biocompute object database, and their own interfaces to create a biocompute. So there are links which will take you to some tutorials to create a vital object that gets stored within the within the biking.org gaming space. Those are some of the things that we are working with. There, there are a few others that are going to come out within the next six months to a year. And we are really looking forward to it. Actually we are already working on some of the COVID-19 and the SARS related issues that is all on everybody’s mind. So biocompute is also playing, or at least we are trying to go to create right computer objects, which might help in that direction.
You took the words right out of my mouth for, with regards to COVID-19, because we know this omnipresent pandemic may consume all of us. We know that the race is on right to find the vaccine hundreds of companies are getting into it. So my question to you is how, how can really biocompute help the researcher right now and beyond just COVID-19 what are some, what would you in your mind say, this is a great use case to use the biocompute standard, whether it’s in vaccine or some other sort of application within the healthcare ecosystem.
You know, that’s something, I mean, I’m wondering use the, the, the COVID-19 as an example, then I can talk about a little bit about a few other things. Right now there are thousands off genomic sequences for the SARS strains that are being generated. Many of them are getting deposited at NCBI or GIS aid or other places. And many of these genomes people are calling variations. They say these genomes of SARS strains, which were isolated from let’s say, Germany is different than what has been isolated from Australia. There’s a big bioinformatics application that has to happen for you to make those kinds of statements, right? You first, the next generation sequencing, you assemble all of the reads, and then you identify what are the new stations based on the restaurant strain that they’re using.
If I use the Wuhan reference string, for example, and you use a different reference strain or mutation profile a little bit different, and trying to dig in who is using what it’s time consuming and actually makes it very hard to figure out how to compare and contrast results from different groups from around the world, if people are losing biocompute objects. So when you tell me these are the mutations, and this is the biocompute object that defines exactly how I found it. Then when I analyze it, then I can know easily what exactly you did. And this is going to be important. And not only in identifying what are the different mutations circulating mutations right now in the human population, it’s going to help people who are working on vaccines or working on antivirals. We’re working on drugs to see how the mutations are happening.
Let’s say in the spike protein, which is one of the most important vaccine targets, the spike protein, which is a glycoprotein. So this is important. Now it has been important in the past. It has, it will be important in the future talking about the past. So several years ago, there was an outbreak of food pathogen in Germany, and next generation sequencing data was used to identify the pathogen fast forward several years. There was another outbreak of a similar pathogen over, in all care in the U S. Now, if we had the biocompute objects on the Germany study, then we could apply it and see, okay, so we are using the exact same methods to see if we are able to detect the pathogen that was detected in in, in Germany. So it saves a lot of time. It just saves a lot of effort, but on top of that, it helps us also see how by informatics methods and other technologies are evolving over time.
So, for example, what if the current methods are more sensitive? So you use the old biocompute object to then improve upon it, to say, hey, now we can detect at a much lower level, our faculty at it using this biocompute because you’re using a much more sensitive software and the name of the software and the portion of the software, then wouldn’t biocompute. So all of these things are not only important to save time and money, but also helps us understand, are we actually getting better at doing some of these things over the years, or are we just at standstill? And the algorithms are not getting better? So this is an easy and a quick way to evaluate these and the things because my computer objects also has a computer also has the input files, the output files, and also what are the possible errors that one can generate. And the validation that is associated with the, biocompute in together, all of this can help a user to run analysis what the original authors of the biocompute object had used an envision of what the sparkle should be.
Yeah. That was a really great answer. The one quick thing I would add to that is that there has actually been a similar use case that I’ll mention really quickly which was something called the RE TB pipeline. And that’s a pipeline that the world health organization adapted for the detection of tuberculosis. And so one of the researchers who was funded by the Bill and Melinda Gates foundation actually came and presented at one of our workshop, the ways that they’re using biocompute for that pipeline for, for ways similar to what Raja was talking about. And so I think that’s a great example because it’s a situation where you have lots and lots of researchers that are all very geographically distributed around the world, and they all need to be on the same page fast, and they don’t have time for these big errors in communication and things need to be very clear.
And so biocompute is perfect for that. And it lets researchers as we said earlier, keep doing what they’re doing without needing to change anything about their workflows. It just gets everyone on the same page as far as how that communication happens back and forth, and it sets expectations for what data is in the document and where it exists and so on. And since there is so many similarities with, with COVID-19 research, I think that’s, that’s a really good use case example to kind of pattern some of this work after and it also kind of helps demonstrate the utility, you know, it sort of sets the precedence for using bio-computer in that kind of a way.
Excellent. I mean, I automatically could see right away once I read the full standards deck about it. So, so Jonathon, this is not something you can maybe so easily just pick up and go with. I imagine there may be some training, obviously in today’s situation. It might be virtual. Do you guys have anything going on? How can people find out about if there’s any kind of training or a virtual training?
Yeah, it can’t, I mean, we made it tried to make it as easy to understand as possible. You know, that’s sort of the fundamental idea is grouping all of the information into these conceptually meaningful categories. If you want to know the parameters, you go to the perimeter parametric domain, if you want to know the IO files, you go to the input output domain. So, you know, at a very basic, yeah, I get it kind of level. Hopefully it will be relatively easy to understand, but you’re right. It’s, it’s got a lot of depth to it and a lot of advanced things that you can do. And so as Raja mentioned, we are building training modules for the FDA right now to explain how to read a BCO, what information is in it, what to do in certain circumstances and so on.
And we can certainly build training modules for other groups based on our experiences, too. I think at this point, it’s safe to say we’re sort of subject matter experts in this space. And I think the best way to do that is just to reach out and to contact Raja. And I we are putting together a lot of different training modules and materials. We have a BCO editor that can help people. It’s a, a web based a form-based way to build BCS that’s on the web and it kind of walks you through building it. But as I mentioned, there’s more advanced things that you can do with it. I talked a little bit about the extension domain. There’s a lot of things that you can do in that kind of a case. If your, your project is very specific, you wanted to build a bibliography domain or a supplemental domain or something like that. There’s a lot of really cool things that you can do with it. And we can, we can most definitely help out with that and the best way to do that. It’s just to directly contact Raja. And I great.
So we’re up on our time. I want to thank Raja and Jonathon for joining me today. I feel like we could have maybe made this interview for like two hours, cause there’s so much great stuff in there. We didn’t even start to scratch the surface of the opportunity for 2791, but also I want to share with all of you out there that 2791 is actually part of a new pilot we’re doing in the healthcare life science practice called the rapid activator program. And it’s exactly how it sounds. The idea is a recently published standard that we want to put to work and try to get some feedback on how it’s performing its environment. So if it’s a form of biotech company or a research organization using it and that way we can actually help educate on how to use the standard and what outcomes to look for and that kind of thing.
So if you’re a researcher out there or, you know, in a pharma or a biopharmaceutical company, or within a government research organization who feels that this would be a great opportunity for them, please do not hesitate to reach out to me. It’s email@example.com. Also, as Jonathon mentioned, the training and all the suite of opportunities, if you’re interested in learning more, you can visit www.biocomputeobject.org. There’s a whole bunch of great information there even how the, the whole biocomputer object came to happen. So I think that’d be a great resource and to learn more about the actual standard and other IEEE standards. And also P2791 is featured in our contributions and work we’re doing for COVID-19. You can visit standards that standards.ieee.org.