Frequently Asked Questions (FAQ)

General questions:

On the Results page, you will find a session summary stating your unique Session ID alongside other information. Please make a note of this ID as you will need it to retrieve this run later on. If you have downloaded your results data, you will find this Session ID in all file names as well as in the config.json file.
Firstly, we’re very sorry that something didn’t work for you. Please report a bug at this email address: proteinlens@imperial.ac.uk. When submitting a bug, please make sure to include as much information as possible, i.e. what browser you accessed ProteinLens from, what file you used, all the input parameters and of course, details about what went wrong. Ideally, the error should be reproducible. Otherwise, it will be much harder for us to spot the bug. Once you have emailed us, we will do our best to fix this as soon as possible and will let you know when we have done so.
We are currently preparing a manuscript for this webserver and we will keep this space updated on that. In the meantime, please cite the figshare publication of the webserver:

Mersmann, S., Strömich, L., Song, F., Vianello, F., Barahona, M. & Yaliraki, S. N. (2020).
ProteinLens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules. DOI: 10.6084/m9.figshare.12369125.v1

We’re delighted that you’d like to use our webserver commercially and would be happy to issue a commercial licence for you. Please send us an email to proteinlens@imperial.ac.uk and we will get back to you as soon as possible to discuss further details.

Questions on inputs:

This depends a little bit on the information that you have available. A good first port of call is the official RCSB PDB webpage, which has a powerful search functionality that supports a wide range of input information. Other resources that could also prove useful: UniProt, the National Center for Biotechnology Information (especially for BLAST searches of sequences).
In this version of ProteinLens, we only support structures containing: Proteins (standard and most non-standard amino acids), DNA, RNA and most small molecule ligands. If you’re interested in analysing structures beyond this, please feel free to contact us with further details and we would be happy to look into it. It may just be that the default settings of the webserver cannot produce a good graph, but with a little manual adjustment, we may be able to help you.
If you are sure that your wanted PDB entry exists, it is possible that the entry does not provide a PDB format file for us to access. Thanks to methodological advances, bigger and bigger structures are deposited in the PDB. Some of those can only be downloaded in PDBx/cif format. At this moment, the ProteinLens server is only able to process structural files in the PDB file format. For the detailed specifications, see here.
At this moment, the ProteinLens server is only able to process structural files in the PDB file format. For the detailed specifications, see here. If your file follows another file format, please make sure that you have converted it to the PDB file format before using it on this webserver.
Most commonly occuring in structures solved by NMR spectroscopy, models are individual structures that are part of a larger ensemble. Since the graphs constructed from biomolecular structures are based on single coherent structures, we are currently unable to incorporate information from an ensemble of structures. However, this may change in the future, so please keep checking back here.
One chain is a single, continuous sequence of amino acids. Depending on the research question that you’d like to explore, you may have to choose the appropriate chains for analysis e.g. in proteins where multimerisation occurs. It can be helpful to consult the relevant page at the RCSB PDB. It could also help to visualise the structure using a program such as PyMol or Chimera.
The biological assembly (also sometimes referred to as the biological unit) is the macromolecular assembly that has either been shown to be or is believed to be the functional form of the molecule, for example two chains forming a homodimer. Most common in X-Ray crystallography, the PDB file sometimes only contains just one crystal asymmetric unit. Depending on the particular crystal structure, symmetry operations consisting of rotations, translations or their combinations may need to be performed in order to obtain the complete biological assembly. Alternately, a subset of the deposited coordinates may need to be selected to represent the biological assembly. This can sometimes lead to multiple biologically relevant molecules being described in one single file. For more information on biological assemblies, see here. Depending on your research interest, you may have to choose a particular assembly corresponding to the correct state of the protein in question. For this, it is easiest to consult the relevant page at the RCSB PDB.
As mentioned above, biological assemblies usually pick out the relevant subset of chains automatically. This may lead to unwanted option clashes, e.g. if the chains needed for a particular assembly are deselected. In order to retain maximal customisability, the ability to custom-select chains is provided by default. Choosing a biological assembly can be found in the advanced settings and will override any chain selections made before.
This is likely due to the size of your submitted structure. If the protein you submitted has more than ~70,000 atoms, the analysis may take a few minutes more. Please be patient and keep the browser window open. However, if the computations take longer than an hour or if you think that something isn’t right and your structure should take less time, please submit a bug report (check above on how to do that).
To give you an estimation of the time a normal ProteinLens run takes, we provide an overview of CPU time measurements in the table below. You can find data on a range of proteins of different sizes and we split the calculations into the different steps of a ProteinLens analysis.

Example CPU running times (in seconds):
PDB ID No. atoms No. residues Graph construction Bond-to-bond propensity Markov transients
3K8Y 2619 166 3.88 0.75 8.34
3ORZ 4544 279 5.70 1.47 20.78
1LTH 19252 1260 23.28 17.35 504.64
1HOT 25038 1596 32.81 34.75 1103.37
7GPB 53642 3311 86.97 289.73 12819.89
At the moment the maximum protein or biomolecule size that we can analyse is practically limited by the PDB file format which ProteinLens relies on as input format. Typically this is at 99,999 atoms, since after that, the PDB file format runs out of spaces for atom IDs.

Questions on settings:

Yes! Since our methodology is very computationally efficient, we are able to process even large structures. However, it may take a little longer to process a large structure. Please be patient and bear with it.
In almost all cases, a disconnected graph with multiple components has one component that contains almost all atoms of the structure. This means that we were unable to connect some parts of the structure to the main body of the protein. There are two ways for you to resolve this: You can go back and modify your input PDB file to include more information, or you can choose the specific component that you’d like to use. Usually, this should be the component with the most atoms, corresponding to the main body.
Usually, this will be the active site of your chosen protein or some other particular binding site of interest. The analysis will then calculate the coupling between the source residues and all other parts of the protein. Please have a look at the Background page for more information and details on our methodology.
Whilst Bond-to-bond propensities is highly efficient and should be computed even for large proteins within tens of seconds, Markov transients may need a little longer. Usually this will be roughly an order of magnitude longer than Bond-to-bond propensities. This is mainly due to the simulation of a random walk on the graph, where the number of iterations needed to fully cover the structure grows with the size of the graph.

Questions on results:

The Results page of the ProteinLens webserver provides a range of data visualisation types, geared towards different perspectives of interest. However, if you’d like run your own analyses, you can download the resulting data at the end of the Results page. For a detailed list of all the files contained there, have a look at the sessionXXXXXX.README file in the downloaded results folder.
From the results files, it is possible to visualise your results in PyMol, by simply using a colour gradient over the relevant column in the file, e.g. if you like to colour the residues according the quantile scores from a Bond-to-bond propensity run, you should colour the residues in PyMol according to column qs in the file ending on _propensity_residues.csv. Sometime in the near future, we will provide a PyMol script which will do this automatically, so please check back here for that.