Machine Learning Helps Achieve Greater Depth in Proteome Analysis
Over recent years, we have seen a boost in the application of expert system (AI)-based methods– such as artificial intelligence– across a variety of biological disciplines.
Proteomics is a field of research study that uses unrivaled insights into cellular biology, with prospective applications spanning modern medicine, food science, agriculture and systems biology more generally. Over the last decade, the proteomics research field has advanced quickly.
We can now study more proteins than ever before using an increasingly smaller sample size, at a higher speed and with increased sensitivity. Such sophistication is associated to innovations in analytical technologies, such as mass spectrometry (MS). How can researchers go deeper still in their proteome analysis?
Earlier this year, Technology Networks spoke with Rohan Thakur, executive vice president of Life Science Mass Spectrometry at Bruker Daltonics, on how Bruker is assisting researchers to “raise the bar” and accomplish new heights in proteomics.
Because that conversation, Bruker launched its novel CCS-enabled TIMScore ™ algorithm, which can be utilized on the timsTOF Pro 2, timsTOF HT, timsTOF SCP and timsTOF fleX systems, in addition to its TIMS DIA-NN 4D-Proteomics ™ Software Application.
To comprehend how device learning methods and new software capabilities are helping proteomics scientists gain higher depth in their analyses, Innovation Networks just recently interviewed Tharan Srikumar, product manager in bioinformatics at Bruker Daltonics. In this interview, Srikumar discusses how the novel TIMScore algorithm works to get rid of obstacles in analyzing tryptic and phosphorylated peptides, goes over the capabilities of TIMS DIA-NN 4D-Proteomics software and enhancing the performance of proteomics workflows.
Molly Campbell (MC): Can you discuss for our readers how the TIMScore algorithm was developed with Bruker’s customers in mind?
What we didn’t have was a complimentary software application solution that totally leverages TIMS technology for carrying out information analysis. We saw this as an opportunity, as we weren’t making use of all the information that was being supplied to us from the instrument.
This line of idea led to our very first attempt at making more use of the collisional cross section (CCS) details that’s present in the data. The concept being if you understood the real worth– the real CCS of a given peptide at a given charge state– then you can compare that to what you’re measuring. Therefore, the model can forecast unphosphorylated as well as phosphorylated peptide CCS’s with really high accuracy and reproducibility.
For the more ambiguous identifications, where there is uncertainty in the recognition, either because the fragmentation pattern is not clear enough, or the peptide mass error between the measured peptide and the prospective identifications is bigger, we can use the relationship of how well the anticipated CCS matches the determined CCS. We can use this to either state, “this is a decoy peptide” and if there’s a false favorable, we shouldn’t account for it, or to say, “no, this is not a false favorable, this is a true match, and we desire to utilize that identification in the dataset”.
Ash Board (AB): What problems have been associated with analyzing tryptic and phosphorylated peptides previously, and how does the software overcome such obstacles? In addition, why do we require to record PTM data in the context of studying the proteome and how can the 4D-Proteomics approach aid to produce this information?
TS: Let’s think about an analogy. State the sample that we are looking at is a room, and we’re standing really close to the door of the room. If we had a peephole into the space, we get a really minimal view of the room itself, however we might gain some insights into what’s inside the room– what furniture is in there and who remains in the space, for example.
Compare this to being able to open the door into the space. You have a much wider view of the room and can produce a better description of what’s really present and taking place. Of course, if you can step in, you get a totally immersive experience of being in that space.
You can see possibly 1000-3000 proteins and a number of 10000 peptides. It provided you a description, or an idea, of what was in the space– or the sample.
With TIMScore and PASEF, we’re letting you have either a bigger peephole, or the capability to unlock totally and step within, developing a much wider view of what is there. Post translational adjustments (PTMs) play such a critical role in biology. Understanding the function of PTMs– or recognizing what PTMs exist– in what amount and where within a cell, is essential for the understanding of biology. There is now a deeper, or broader view, which ought to equate to our consumers getting much deeper understanding of what’s in samples they are studying.
MC: What abilities does the TIMS DIA-NN software have compared to previous software systems?
TS: TIMS DIA-NN is our first software to analyze dia-PASEF information. It is based on the open source, DIA-NN software application from the laboratories of Teacher Markus Ralser and Dr. Vadim Demichev. We have forked that job and put a bigger emphasis on the CCS measurement itself. We’ve also incorporated it into the PaSER platform, so you have a workflow that’s automatically triggered at the end of your acquisition. From a user viewpoint, you established your experiment, your measurement on the timsTOF acquisition PC that consists of establishing your processing method. At the end of the acquisition the TIMS DIA-NN is triggered, and you have the results awaiting you a few minutes later on.
You no longer need to get all of your information, then copy all of the files to your processing computer system, start the analysis and then come back a couple of hours later on to review the data quality, or to see if the column blocked or something along those lines. You now have one workflow, which you established and leave. You can come a couple of minutes after the acquisition and have an outcome file waiting for you if you need to check on the data. When you desire to compare the data across your whole job, or whatever tens or numerous samples that you have an interest in, you can group them all for analysis to complete any missing data utilizing an idea called “match in between runs”. The effectiveness of this concept is also increased by the use of CCS. With that, you have a full task view of all the proteins and peptides that were identified and quantified, in this case, across the entire project.
AB: For you, which Bruker client case research study really demonstrates the impact the unique software can have in proteomics analysis?
We worked together with the group’s existing information set to see if there were any gains to be made with TIMScore. I believe we were seeing anywhere in the range of 30– 40%, depending on whether you were looking at the protein or the peptide level.
We could not only determine more phosphorylated peptides in this particular case, however we were also able to recognize more phosphorylation sites at the exact same self-confidence level. That is, not just might we recognize the peptide series in the protein that was modified by phosphorylation, but we could also recognize the exact amino acid at which this phosphorylation event was happening at. This meant that we were not seeing uncertain identifications; we might localize it to a really particular residue, and that suggests a much better understanding of the signaling biology.
We built PaSER as a platform, and as part of PaSER we’ve now incorporated TIMScore and TIMS DIA-NN. Among the more typical circumstances that we’re seeing offered– or that is being used– is to carry out a little pilot research study to construct a spectral library, possibly from a fractionated information set or from pooled samples. Then, utilizing TIMScore, build an in-depth peptide spectral library as possible, prior to doing a much larger mate of samples to be studied utilizing TIMS DIA-NN.
Then you do this study in DIA mode across 100– 1000s of samples. We have actually seen pilot tasks with a number of thousand samples, and even bigger tasks planning to utilize over 10,000 s of samples. What the incorporated workflow and PaSER lets you do is keep an eye on the entire project as it’s progressing, however also get feedback on blocks of information as you’re moving forward. In basic, we’re seeing our customers migrating towards DIA and we’re helping with that with the PaSER platform.
MC: Can you go over the impact that unique software application systems are having on the proteomics field? To what level are they assisting to conquer the data traffic jam?
TS: Our approach has actually been extremely various than what has actually generally been attempted. Among the biggest bottlenecks in proteomics was that you can create numerous samples per day– but obviously, you require to process that data.
Among the simpler paths that was taken was to move to “the Cloud”, where you can scale your requirements computationally. But, again, you are still waiting to get your entire job before moving that into a Cloud environment to process it, and after that you’re still waiting quite some time or you’re spending a great chunk of cash to procedure that quickly in the Cloud.
Among the preliminary concerns we had with PaSER was, why do we wait? We have all this time while we’re obtaining the information, that we might be using to process it. This is among the key differentiators in between the PaSER platform and some of the other software application options.
I believe we will start to see forecasts being used more broadly, not just for CCS’s. You reach a phase where you can really with confidence predict and recognize the attributes of a peptide well prior to you have actually made the measurements. You can utilize that knowledge to modify how you’re going to get the information so it may much better suit your speculative design.
The other element would also be that we’re altering the traffic jam from acquiring the information to analyzing the information and we’re going to create a brand-new bottleneck at post analysis. I believe this will be an intriguing place to keep your eyes on.
MC: Are you able to talk about any future strategies in terms of further enhancing software capabilities?
TS: I believe there are a few areas that we have not covered. For circumstances, de novo sequencing is not an ability that’s presently a part of our portfolio, we’re truly looking forward to incorporating that choice in. We’ve provided services for various workflows but desire to continue establishing that. As CCS forecast and all of the other forecasts that we desire to do, that’s an area where we have a lot of focus, particularly with concerns to PTMs. We presently support phosphorylation, however we wish to grow that to cover preferably, all of the PTMs, whether the model has actually seen it or not, and to be able to forecast those precisely so we can use that in TIMScore and other applications. There are likewise the more comprehensive aspects of quality assurance, analytical analysis and information visualization that we plan to make an impact in.
You no longer have to obtain all of your data, then copy all of the files to your processing computer system, begin the analysis and then come back a few hours later to examine the data quality, or to see if the column blocked or something along those lines. When you desire to compare the data across your entire job, or whatever 10s or hundreds of samples that you’re interested in, you can group them all for analysis to fill in any missing information utilizing an idea called “match in between runs”. We worked together with the team’s existing data set to see if there were any gains to be made with TIMScore. One of the more common scenarios that we’re seeing offered– or that is being utilized– is to undertake a small pilot study to develop a spectral library, perhaps from a fractionated information set or from pooled samples. You can use that knowledge to modify how you’re going to obtain the information so it might much better fit your experimental style.