![]() |
ProSHADE
0.7.5.1 (JAN 2021)
Protein Shape Detection
|
ProSHADE is a C++ language library and an associated tool providing functionalities for working with structural biology molecular structures. The library implements functions for computing shape-wise structural distances between pairs of molecules, detecting symmetry over the centre of mass (or centre of map) of a single structure, map re-sizing as well as matching density maps and PDB coordinate files to one another. The executable implemented in the bin.cpp
file then allows easy access to these functionalities without the need for library linking, while the python modules provides easy access to the functionality from the python language. For help on how the executable should be used, refer to the -h
option of it. For more details about the functionalities, see below.
The most recent stable version of ProSHADE is available from the master branch of the GitHub repository https://github.com/michaltykac/proshade, from where it can be cloned using the git application or downloaded manually using the interface. More advanced users may be interested in obtaining the development or the experimental branches, which are available from the same link. The experimental branch is where I do all new development and it may or may not be currently compilable and working properly, while the development branch should always compile, but is more likely to contain bugs and issues as it is the code before proper testing.
1) Introduction
3) Index
4) Installation
4.1) Standard System Dependencies
4.2) Other dependencies
5.1) Spacegroups
5.2) Waters
5.3) Models
6.1) Symmetry Detection
6.2) Shape similarity distances
6.3) Re-boxing structures
6.4) Optimal rotation and translation
7.1) Linking against the ProSHADE library
7.2) Examples of ProSHADE library usage
The installation of the ProSHADE software should be done using the CMake system and the supplied CMakeLists.txt file. The minimual requiered version of CMake is 2.6, however, python modules and single source file compilation will not be available unless CMake version 3.4 or higher is used. The CMakeLists.txt file assumes the standard system dependencies are installed in the system folders; for a full list of standard system dependencies, please see the section Standard System Dependencies.
Once all of the standard system dependencies are installed CMake can be run to create the make files as described in the section Installing using CMake. Alternatively, ProSHADE also provides setup.py script, which wraps the CMake installation - please refer to the Installation using pip section of this documentation for more details. The main difference between these two installation approaches is that using CMake allows building the executable and the dynamic C++ library, but will install the python module only locally, while installing using pip will install only the python module, but will install it globally. Combining these two installations is not a problem.
Moreover, if CMake is used to build ProSHADE directly, then the user may make use of several options that can be used to modify the default behaviour of the installation; these typically drive the installation locations and dependencies paths in the case of non-standard dependency location. Please see the section CMake options below for details as to how to use these options and what do they do.
Please note that while the ProSHADE code is C++ 98 standard compatible, some of the dependencies do require at least partial support for the C++ 11 standard and building python module requires full C++ 11 support.
Generally, the following list of standard system libraries and utilities are required for successfull installation of ProSHADE on Unix systems:
CMake should complain and issue a reasonably decipherable error messages if any of these dependencies are missing.
Installing python
While most modern Unix systems come with some version of the python language pre-installed, it seems reasonable to assume that users who are interested in using the ProSHADE python module do have their preferred version of python already installed and set as the default system python (meaning that the python
command points to the python executable that the user wants the ProSHADE module to be installed for).
Should the user not have any python version installed or should the user be interested in having multiple versions, the Anaconda environment ( https://www.anaconda.com/products/individual ) can be recommended for installation of python and management of various environments.
Installing standard system dependencies on MacOS
Assuming a clean MacOS, the ProSHADE dependencies can be installed as follows: Firstly, the XCode tools should be installed from Apple - this can be achieved by issuing the command:
Next, CMake will need to be installed manually; that is, starting with downloading the source codes from https://github.com/Kitware/CMake/releases/download/v3.19.2/cmake-3.19.2.tar.gz . After moving the downloaded file to where the codes should live and navigating to the same location in Terminal, please use the following commands to install CMake:
Finally, some MacOS systems do not have the FFTW3 library pre-installed. If this is your case, then please use the following commands to install FFTW3 manually: Firstly, download the source codes from here: http://www.fftw.org/fftw-3.3.9.tar.gz . After moving the downloaded file to where the codes should live and navigating to the same location in Terminal, please use the following commands to install FFTW3:
Now, ProSHADE should be automaically installable using the CMake system.
Installing standard system dependencies ustng apt-get ( e.g. Ubuntu or Debian )
The APT package manager can be used to install all the system dependencies of ProSHADE using the following command.
After this, ProSHADE should by automatically installable using the CMake system.
Installing standard system dependencies using ZYpp ( e.g. openSuSe )
The ZYpp package manager and the associated zypper command-line tool can be used install all the system dependencies of ProSHADE as follows:
After this, ProSHADE should by automatically installable using the CMake system.
Installing standard system dependencies using yum ( e.g. CentOS )
Firstly, at least on some systems, the yum package manager may not be using the powertools repository; however, some of ProSHADE dependencies are kept there. Therefore, the user may need to enable the powertools repository by issuing the folloing commands:
Then, the yum package manager can be used install all the system dependencies of ProSHADE as follows:
After this, ProSHADE should by automatically installable using the CMake system.
ProSHADE also depends on the Gemmi and SOFT2.0 libraries. The installation of these libraries is automated in the CMake scripts and therefore does not require any user input (these libraries are supplied with the ProSHADE code and will be installed locally by the ProSHADE CMake installation). Please note that these dependencies do have their own licences (the Mozilla Public License for Gemmi and the GPL licence for SOFT2.0) and therefore this may limit the ProSHADE usage for some users beyond the ProSHADE copyright and licence itself.
CMake is the default ProSHADE installation tool and if the binary or the library is needed, then it is the only installation option. The python module can also be build using CMake, but it will be installed only locally and all python scripts will need to add the installation location to their PATH. Alternatively, if the python module is what the user is after, then it can be installed globally using pip - please see the Installation using pip section for more details.
In order to install ProSHADE, first please check that all the Standard System Dependencies are installed, preferably using a package management system such as apt , yum , zypper , homebrew etc.
Next, please navigate to any folder to which you would like to write the install files; some find it useful to create a build
folder in the ProSHADE folder in order to keep the install files in the same location as the source codes. Then, issue the following set of commands, changing the \path
\to\ProSHADE to the correct path on your system and adding any required CMake options (described below) to the first command. Please note that sudo
may be required for the make
install
command if you are installing into the system folders.
CMake options
-DINSTALL_LOCALLY=ON or OFF
-DINSTALL_BIN_DIR=/path
-DINSTALL_LIB_DIR=/path
-DINSTALL_INC_DIR=/path
-DCUSTOM_FFTW3_LIB_PATH=/path
-CUSTOM_FFTW3_INC_PATH=/path
-DCUSTOM_LAPACK_LIB_PATH=/path
-DBUILD_PYTHON=TRUE or FALSE
Uninstall using CMake
To remove the installed ProSHADE components, the command make
remove
needs to be issued to the makefile originally created by the CMake call. Please note that sudo
may need to be used if the installation was done into the system folders and your current user does not have admin rights.
The ProSHADE python module is also available (as a source distribution) on the PyPi repository. Therefore, the ProSHADE python module can be installed by simply issuing the following command; however, this assumes that all the system dependencies as discussed in the Standard System Dependencies section are already installed. If any of them is missing, then a cryptic error message will ensue - consider yourself warned.
Alternatively, the module can be build and installed using pip directly from the ProSHADE GitHub repository ( https://github.com/michaltykac/proshade ) using the following command. This approach has the advantage that it takes the most current stable version, rather than being dependent on the authors not forgetting to update the PyPi repository.
Again, please note that the pip installation only wraps around the CMake installation and that CMake is still being run by pip in the background. Therefore, the same system dependencies are required. Moreover, if any of the system dependencies is missing or cannot be found, then a bit more cryptic error message will be printed by pip.
Uninstalling pip installed module
Should the user require to uninstall the python module and all associated data after they were installed globally using pip, the following standard pip command can achieve this task irrespective as to how exactly the proshade module was installed:
There are several caveats to inputting PDB files; most of these have to do with the fact that PDB files encode much more information than ProSHADE is intended to use. Therefore, ProSHADE is by default set to disregard information it does not need; however, if the user so requires, the information may be used, albeit it may pose some unexpected problems.
By default, ProSHADE will ignore the PDB file encoded spacegroup and will instead force the P1 spacegroup onto the input files. The reason for this behaviour is that when computing the theoretical density map, some spacegroups will cause density from other cells to be added as well (e.g. P21 21 21). Since ProSHADE is intended to use the structure shape irrespective of the experimental method (i.e. irrespective of crystal packaging), having density from other cells would cause ProSHADE to perceive differences where the structures could be identical except for the spacegroup. To force ProSHADE to make use of the spacegroup, please supply the -u
command line option.
By default, ProSHADE will remove all water molecules from any input PDB files. The reason is similar to above, as ProSHADE is intended to compare protein shapes and as waters are in most cases not an integral part of the protein, this behaviour is attempting to avoid situations where two identical structures with one having hundreds of waters and one not would be perceived by ProSHADE as significantly different. Should the user require the water molecules to be used by ProSHADE, please supply the -w
command line option.
There are examples of both, PDB files containing multiple models of the same structure (with minor differences, e.g. trajectory files) and PDB files which have their chains (or collections of chains) separated into different models. Given this state of affairs, ProSHADE will by default use only the first model of each input PDB file and will print a warning message (which can be supressed by setting verbosity below 0) for each file it reads which has more than one model. Should the user want to use all available models for the input PDB files, please supply ProSHADE with the -x
command line option.
The ProSHADE tool was developed in a modular fashion and the usage slightly changes depending on the functionality that is required. Nonetheless, care has been taken to make sure that identical or closely related features are controlled by the same command line arguments in all cases. Moreover, the GNU command-line options standard have been adhered to (through the getOpts
library) and therefore the users familiar with other command line tools should find the entering of command-line arguments simple. The following subsections relate to examples of using different functionalities; for a full list of command line options, please use the –help
command line option of the ProSHADE binary.
In order to detect symmetry in either a co-ordinate input file or in a map input file, the ProSHADE executable needs to be supplied with the option -S
or –symmetry
and it will also require a single input file to be supplied using the -f
option. These two options are the only mandatory options, although there are many optional values that the user can supply to supersede the default values and therefore modify the operation fo the ProSHADE executable to fit their purpose.
One particular option regarding the symmetry detection mode should be noted; the –sym
(or -u
) option allows the user to state which symmetry they believe to exist in the structure. The allowed values for this command line argument are "Cx", "Dx", "T", "O" and "I", where the x should be an integer number specifying the fold of the requested symmetry. When this option is used, it removes the default behaviour of returning the "best" detected symmetry and instead the symmetry requested by the user is returned, if it can be found in the structure.
Another noteworthy option is the –center or -c option, which tells ProSHADE NOT to center the internal map representation over the centre of density before running any processing of the map (default is centering and adding this option will turn centering off). This may be important as ProSHADE detects symmetries over the centre of the co-ordinates and therefore a non-centered map (map which does not have the centre of mass at the centre of box) will be found to have no symmetries even if these are present, just not over the co-ordinate/box centre.
It is also worth noting that there are several extra functionalities available for the symmetry detection mode when accessed programmatically (i.e. either through the dynamic C++ library or through the Python language module). These extra functionalities include direct access to a vector/list of all detected cyclic symmetries, list/vector of all other symmetry type detections (meaning a list of all detected dihedral, tetrahedral, ... symmetries and the axes forming them) and also the ability to compute all point group elements for any point group formed by a combination of ProSHADE detected cyclic point groups. For more details on these functinoalities, the users are invited to consult the advancedAccess_symmetry.cpp/py example files in the examples folder.
To demonstrate how the tool can be run and the standard output for the symmetry mode of operation, the current version of the ProSHADE executable was used to detect the symmetry of a density map of the bacteriophage T4 portal protein with the PDB accession code 3JA7 (EMDB accession code 6324), which has the C12 symmetry. The visualisation of the structure is shown in the following figure, while the output of the ProSHADE tool follows:
The distances computation mode is signalled to the ProSHADE executable by the command line argument -D
or –distances
. This mode requires two or more structures to be supplied using the -f
command line option. At least two structures are mandatory for the ProSHADE tool to proceed. Moreover, the resolution of the structures to which the comparison should be done needs to be supplied using the -r
option. This resolution does not need to be the real resolution to which the structure(s) were solved, but rather reflects the amount of details which should be taken into accout when comparing shapes. Therefore, higher resolution comparison will focus more on details of the shapes, while lower resolution comparison will focus more on the overall shape ignoring the minor details. Please note that the results are calculated only for the first structure against all the remaining structures, not for all against all distance matrix.
There are a number of useful options for the shape distances computation, please consult the –help
dialogue for their complete listing.
To demonstrate the output of the ProSHADE software tool for computing distances between structure shapes, the distances between the BALBES protein domains 1BFO_A_dom_1 and 1H8N_A_dom_1 (which have similar shape) and the 3IGU_A_dom_1 domain which has a different shape, as can be seen from the following figure - the first two domains are both in cluster a), while the last domain is from the cluster b). The output of the ProSHADE software tool is then shown below:
Another useful feature of the ProSHADE tool is re-boxing of macromolecular density maps. This mode is signalled to the ProSHADE tool by the command line option -M
or –mapManip
followed by the -R
option to specify that the required map manipulations include re-boxing. Furthermore, a single map structure file needs to be supplied after the -f
flag. In this mode, ProSHADE will attempt to find a suitable map mask by blurring the map (increasing the overall B-factors). Consequently, it will use the map boundaries to create a new, hopefully smaller, box to which the appropriate part of the map will be copied.
This ProSHADE functionality can be combinaed with other map manipulations, which include the map invertion (signalled by the –invertMap
option and useful for cases where map reconstruction software mistakes the hands of the structure), the map normalisation (signalled by the –normalise
option, which makes sure the map mean is 0 and standard deviation is 1), centering of centre of mass to the centre of co-ordinates (using the –center
or -c
option) or the phase removal (creating Patterson maps using the –noPhase
or -p
options).
The location and filename of where this new map should be saved can be specified using the –reBoxedFilename
(or -g
) command line option followed by the filename.
The following snippet shows the output of the ProSHADE tool when used to re-box the TubZ-Bt four-stranded filament structure (EMDB accession code 5762 and PDB accession code 3J4S), where the original volume can be decreased to 46.9% of the original structure volume and thus any linear processing of such structure will be more than twice faster and the original. The original TubZ-Bt four-stranded filament structure box is shown in the following figure as semi-transparent grey, while the new box is shown in non-transparent yellow.
In order to find the rotation and translation which optimally overlays (or fits) one structure into another, be them co-ordinate files or maps (and any combination thereof), the ProSHADE tool can be used in the Overlay mode. This is signalled to the ProSHADE tool binary by the command line option –strOverlay
or the -O
and this mode requires exactly two structure files to be supplied using the -f
command line options. The order of the two files does matter, as the second file will always be moved to match the first structure, which will remain static.
Due to the requirement for the second stucture movement and rotation, it is worth noting that the structure may need to be re-sampled and/or moved to the same viewing position as the first structure. This is done so that only the internal representation is modified, but never the input file. However, when the overlay structure is outputted (a non-default name can be specified by the –overlayFile
command-line option) the header of this output file may differ from the second structure header. Furthermore, if there is no extra space around the structure, movement and rotation may move pieces of the structure through the box boundaries to the other side of the box. To avoid this, please use the –extraSpace
option to add some extra space around the structure.
As an example of the Overlay mode, we will be matching a single PDB structure (1BFO_A_dom_1 from the BALBES database, original structure code 1BFO) shown in part a) of the following figure to another PDB structure, this time the 1H8N_A_dom_1 structure from the BALBES database, shown in part b) of the same figure. Please note that ProSHADE can fit any allowed input (map or co-ordinates) to any allowed input, it is just this example which uses two PDB files. Part c) of the figure then shows the match obtained by the internal map representation of the moving structure (1H8N_A_dom_1) after rotation and translation with the static structure (1BFO_A_dom_1). Finally, part d) then shows the original static structure (1BFO_A_dom_1) in brown and the rotated and translated version of the moving structure (1H8N_A_dom_1) in blue. Please note that the optimal rotation matrix and translation vector are written into the output when verbosity (–verbose
) is increased to at least 3, but are better accessed programatically (see the following sections) if you are interested in using these further.
Regarding the output, ProSHADE outputs the following information which should be sufficient to apply the correct rotation and translation operations to obtain the optimal overlay:
Warning: In order to allow visualisation of the results of ProSHADE overlap task, the translation is computed in the real world or visualisation co-ordinates space; this however has the implication that if the moving structure is a density, then the densty box may need to be moved in such a way that it contains the static structure's position. Therefore, the translation computed as described above should not be applied to the density in box, but rather to the box itself (or possibly if such box translation cannot be done perfectly, then the remainder of the imperfect box translation can be done within the box).
ProSHADE allows more programmatic access to its functionality through a C++ dynamic library, which is compiled at the same time as the binary is made. This library can be linked to any C++ project to allow direct access to the ProSHADE objects, functions and results. This section discusses how the ProSHADE library can be linked against and how the basic objects can be accessed.
The ProSHADE library can be linked as any other C++ library, that is by using the -lproshade
option when calling the compiler (tested on clang and g++ ) and including the header file (ProSHADE.hpp
). However, as the ProSHADE.hpp
header file includes header files from the dependencies, any C++ project compiling against the ProSHADE library will need to provide these dependencies paths to the compiler. Moreover, if the ProSHADE library was not installed in the system folders (which are by default in the compiler paths), any project linking against the ProSHADE library will also need to provide the path to the libproshade.a/so/dylib library file and the RPATH to the same location. The following list states all the paths that may be required for a successfull compilation against the ProSHADE library:
Overall, a compilation of a C++ project linking against the ProSHADE library may look like the following code:
or
There are several examples of C++ code which makes use of the ProSHADE dynamic library to compute the standard ProSHADE functionalities and access the results programmatically (i.e. without the need for parsing any log files).
Simple access
The examples are avaialbe in the /path/to/proshade/examples/libproshade folder and are divided into two categories of four examples. The source files with names starting with simpleAccess_... provide a black box experience similar to using ProSHADE binary. The user firstly creates a ProSHADE_settings
object, which provides all the variables that can be set in order to drive which ProSHADE functionality is required and how it should be done. Next, the user needs to create the ProSHADE_run
object, whose constructor takes the already created and filled ProSHADE_setings
object as its only argument. This constructor will then proceed to compute all required information according to the settings object and return when complete. While the computation is being done, the execution is with the ProSHADE library and any C++ project using this mode will be waiting for the ProSHADE library to finish. Once the computation is complete, the execution will be returned to the calling C++ project and the results will be accessible through public functions of the ProSHADE_run
object. The following code shows a very simple example of how ProSHADE can be run in this mode, but for more specific examples the users should review the simpleAccess_... example files.
Advanced access
The second set of examples of usage of the ProSHADE library are the source files with names starting with advancedAccess_... . These files provide examples of how individual ProSHADE functions can be arranged to provide the results of the main ProSHADE functionalities. Using the ProSHADE tool in the manner shown in these example codes gives the user more control over the execution and it also allows the user to modify the behaviour directly. On the other hand, using ProSHADE in this way requires a bit more understanding than the simple black box approach and this documentation should be helpful for all who wish to use ProSHADE this way. Interested users are advised to review all the advancedAccess_... source files as well as the following basic example code.
ProSHADE also provides a module which allows the same programmatical access to the ProSHADE tool as the dynamic C++ library. This module is produced using the PyBind11 tool ( https://github.com/pybind/pybind11 ) and supports the numpy array data types as both input and output of the C++ function calls.
Similarly to the ProSHADE dynamic library, the python code examples are available in the /path/to/proshade/examples/python folder. They are, again similarly to the dynamic C++ library examples, divided into different categories.
Simple access
Similarly to the dynamic library case, there are three types of examples available for the python modules. The first set of examples (files named simpleAccess_... ) show the black box experience, which is similar to using ProSHADE binary. The user needs to create the ProSHADE_settings object and can then supply it with all the settings values which will then drive the ProSHADE computations. The same settings are available in the python modules as in the ProSHADE library; the example code below shows only a small selection of these (for full selection, please see the example files). Next, the user creates the ProSHADE_run object, constructor of which takes the settings object as its only argument and then proceeds to do all computations required by the settings in the settings object. The computations are done on this one line and if they take time, the execution of the script will be halted until ProSHADE is done computing. Once completed, the results can be retrieved from the ProSHADE_run object using the public accessor functions; the example code below shows how the symmetry functionality can be run and results retrieved - for examples of other functionalities and for more details, please see the simpleAccess_... example files.
Advanced access
If the user needs more control over the ProSHADE exectution, or simply wants any behaviour not simply available by variables in the settings object, then there are the advancedAccess_... examples. These showcase the ability to call internal ProSHADE functions and by ordering them correctly, achieving the requested functionality. This usage of the python modules does required a better understanding the of the ProSHADE tool and the functions it implements. This documentation is a good starting point as to which functions are available and the ProSHADE_tasks.cpp source file shows how the internal functions can be arranged to achieve the standard ProSHADE tasks. Please be aware that most of the functions do require that a pre-requisite function is run before it, but not all of these pre-requisites have their own warning or error messages. Therefore, if any code causes segmentation error (which usually kills the python interpreter), it is likely that you forgot to call some pre-requisite function.
The following code is an example of how this approach to the ProSHADE python module can be used to compute the shape-wise distances between two structures. After importing the required modules, the code creates the setings object and sets the basic settings (for a full list of settings, please see the example files). It then proceeds to create the ProSHADE_data objects for each structure, reads in the structures from files on the hard-drive (PDB and MAP formats are supported, the mmCIF should work as well). Next, the code processes the data - this is where map centering, masking, normalisation, axis inversion, etc. happens - onto an internal ProSHADE data representation. This representation can then be mapped onto a set of concentric spheres, which can in turn have their spherical harmonics decomposition computed. Once this is done, the shape distances can be computed using the three functions shown.
One of the advantages of this mode of operating the ProSHADE python modules is that the execution only takes the time required to compute the specific computation the function provides and therefore if the user only needs some preliminary results, or can prepare the data for execution later, this is all allowed by this mode.
Direct access
This is the most advanced mode of using the ProSHADE tool, as it allows direct access into the internal ProSHADE working. This in turn allows supplying non-standard values as well as retrieving any partial results for alternative processing by a different code; however, it also requires the deepest understanding of how ProSHADE works, what data are available at which point in the execution and it may require some data format manipulations on the side of the executing code. The following tutorial as well as this documentation should be a good starting point, as well as the directAccess.py file.
In order to showcase this approach, we will import the required modules:
Reading a structure from file
The first step of most ProSHADE workflows will be reading a structure (be it co-ordinates or map) from a file on the hard-drive. This can be done in the same manner as shown in the advanced access section of this tutorial, that is: Firstly we create the ProSHADE_settings object, which needs to be filled with the initial data. It does not really matter which task you select at this stage, but it may affect some of the default values and therefore it is recommended to use the correct taks. Next, the ProSHADE_data object is created and finally the structure is read in. Please note that on some systems using relative paths may not work and it may result in ProSHADE error stating that the file type cannot be recognised. If this is the case, please use the full path.
Creating ProSHADE_data object from map
Alteratively, ProSHADE_data object can be created from an already existing map and some of the basic map information data. As an example, we will create a 1D numpy.array, which will hold the density values of a map that we would like to supply to ProSHADE. Of course this array can be the result of any other python module, the only requirement is that the data type is 1D numpy.ndarray with dtype of float64 with the XYZ axis order.
with an example map created as an 1D numpy.ndarray, it can now be supplied to a ProSHADE_data object, which will then be in the same state as if the data were read in from a file. This can be done with the following call:
There are several assumption that the ProSHADE_data constructor shown above makes and not all of these are currently checked with a warning or error message. Some of these are described in the directAccess.py file, but the most common things to consider are the following:
If some of these assumptions do not hold, the ProSHADE_data object is likely to still be created, but it is the users responsibility to change the pStruct (ProSHADE_data) object internal variables to reflect the reality or face the consequences.
3D arrays
It is possible that instead of 1D arrays as shown above, some other python module would work with maps using 3D arrays. This poses no problem for ProSHADE as the very same constructor accepts 3D numpy.ndarrys instead of 1D numpy.ndarrays and all functionality remains equal, as do the caveats. An example of using 3D instead of 1D map follows:
Writing out maps
Here, we will demonstrate how the user can access the ProSHADE internal representation map from the ProSHADE_data object. Please note that this is not limitted to the initial map, this will work for any ProSHADE_data object which has map data in any stage of ProSHADE computations. The map will be returned as a 3D numpy.ndarray of dtype float64 ordered with the XYZ axis ordering.
Getting back the ProSHADE internal representation map
Here, we will demonstrate how the user can access the ProSHADE internal representation map from the ProSHADE_data object. Please note that this is not limitted to the initial map, this will work for any ProSHADE_data object which has map data in any stage of ProSHADE computations. The user has a choice between a 1D and 3D numpy array maps being returned by ProSHADE; the indexing of the 1D map is the same as above, that is [ z + pStruct.zDimIndices * ( y + pStruct.yDimIndices * x ) ]. Please note that there is the issue with 3D maps and therefore getting a 3D map may be slower (approximately 0.5 seconds per average sized map) as compared to getting a 1D map. The following code shows how the maps can be retrieved back to python:
Initial map procesing
Once the map is read into the ProSHADE_data object, it needs to be processed in order to make sure ProSHADE will be able to use it for any further computations. While processing, ProSHADE offers the following map modifications through the ProSHADE_setting object variables: map invertion (this will invert the map indices along each dimension), map normalisation (making the map density values mean 0 and standard deviation 1), map masking (computing a map mask by blurring and then setting mask as all values above threshold), map centering (moving the map into the centre of mass), adding extra space (in case the map density is close to map edge, what can lead to map artefacts and lower accuracy of further processing) and map phase removal (removing the phase of the map density, effectively producing Patterson maps). The user can choose any, all or none of these, but the processing function needs to be called before any further processing is possible. The following example code showcases how some of the processing functionalities can be chosen and how the map can be processed.
Computing standard ProSHADE tasks
If the user now wants to use ProSHADE to compute some of the standard ProSHADE taks, i.e. Distances computation, Symmetry detection, Re-boxing or Map overlay, it is recommended that the user proceeds in the same fashion as shown in the the advancedAccess_... example files. Moreover, these are also demonstrated in the directAccess.py file available in the examples folder. Therefore, none of these tasks will be shown here in a step-wise manner; instead, the rest of this tutorial will focus on how partial information and results can be obtained from ProSHADE.
Computing the spherical harmonics decomposition
ProSHADE can compute the spherical harmonics decomposition of the internal map. However, instead of using the spherical-Bessel functions, it firstly creates a set of concentric spheres centered on the centre of indices (xDimIndices/2, yDimIndices/2, zDimIndices/2) point and spaced 2 indices apart, then it maps the density map data onto these spheres and then it computes the spherical harmonics decomposition on each of these spheres independently. There is quite a few settings that relate to the spherical harmonics decompostion computation, such as the bandwidth of the computation, the sphere placement and spacing, the resolution on the spheres, etc.; these arre mostly inter-related and ProSHADE will set them up automatically, unless the user specifies otherwise. Since these are quite technical, the interested users are referred to the second chapter of my Ph.D. thesis, which specifies all the technical details: https://www.repository.cam.ac.uk/handle/1810/284410 . To issue this computation, please use the functions shown in the following example code:
If the user is interested in the spherical harmonics values (and possibly does not need any further computations from ProSHADE), these can be accessed using the function showcased below. It is worth noting that the organisation of the spherical harmonics is as follows: The getSphericalHarmonics() function's output is a 2D numpy.ndarray of dtype complex128. The first dimension of this array is the sphere index, while the second dimension contains all the spherical harmonics values for the given sphere wrapped into a 1D array. This saves memory, as this packaging does not result in sparse matrix (there is different number of orders for each band), but on the other hand the user needs to use another function ( findSHIndex() ) to obtain any particular value from this 1D array. The following code shows how all spherical harmonics values can be obtained from ProSHADE and how any particular value can be identified in the ProSHADE output.
Computing the self-rotation function
ProSHADE also allows computing the (self) rotation function. More specifically, it firstly computes the so called E matrices, which are matrices of the integral over all the concentric spheres of the spherical harmonics coefficients of order1 and order2, or in mathematical (LaTeX) form: Integral _0 ^rMAX ( c^lm * c'^lm ). It then proceeds to normalise these E matrices, resulting in the SO(3) decomposition (Wigner D based decomposition) coefficients. Finally, by computing the inverse SO(3) Fourier transform (SOFT) on these coefficients, ProSHADE obtains the (self) rotation function. In order to isue this computaion, the following code can be used:
Once the self-rotation function is computed, ProSHADE allows the user to access all of its interim results as well as the rotation function map. Specifically the E matrices, which are ordered by the band, order1 and order2 (in this order) can be obtained as shown in the following code. The E matrices are 3D numpy.ndarrays of dtype complex128, which suffer from the different number of orders for different bands feature of spherical harmonics. Therefore, the order dimensions of the arrays are zero padded; furthermore, as the order indexing goes from -band to +band, but the array indexing starts from zero, the correction to the array indices is necessary. Regarding the SO(3) coefficients, they have the same technical structure as the E matrices and suffer from the same caveats. Finally, the self-rotation function map can be accessed as a 3D numpy.ndarray of dtype complex128 using the function shown in the example code below. For the users convenience, ProSHADE also provides a function for converting any rotation function position (as given by three indices) onto corresponding rotation matrix - this is shown in the last part of the following example code.
Computing the optimal rotation function
A related ProSHADE functionality is the computation of an optimal rotation function for two input structures. In the standard ProSHADE tasks, this is done for two phase-less structure maps (the phase is removed to achive identical centering on the maps) in order to find the optimal rotation, which overlays the two maps, but the user is free to call this function for any two ProSHADE_data objects which both have their spherical harmonics values computed. To do this, we will create two new ProSHADE_data objects, read in some structures, process them, map them onto spheres, compute their spherical harmonics values and then we call the getOverlayRotationFunction(). This function works similarly to the computeRotationFunction() used above, but it uses spherical harmonics coefficients from two different structures as opposed to the same structures.
The very same functions with the same return values can now be used to obtain the E matrices, the SO(3) coefficients and the rotation function as were used above for the self-rotation function. The following example code recapitulates these functions for the rotation function computed just now.
Finding the optimal rotation
Once the rotation map is computed, the user may be interested in the highest value in the map and the corresponding rotation matrix (or Euler angles), as these will represent the rotation which overlays most of the two structures (within the error of the map sampling). To facilitate this taks, ProSHADE contains functions returning the Euler angles or the rotation matrix associated with the highest peak in the rotation function. The following example code shows how they can be used:
Rotating the internal map representation
Once the optimal rotation angles are obtained, it is the next logical step to rotate the structure by these angles to get the two structures in identical orientation. This can also be done with ProSHADE function rotateMap(), which works with the Euler angles as reported by ProSHADE. The rotation is done using the spherical harmonics coefficients, which are multiplied by the Wigner D matrices for the required rotation and the resulting rotated coefficients are then inverted back and interpolated to a new map. This process has two side effects: Firstly, the resulting maps tend to suffer from minor artefacts resulting from the sequence termination errors and the interpolation to and from spheres to cartesian co-ordinates. And secondly, the input maps need to have their spherical harmonics coefficients computed. Therefore, this approach is not recommended for any maps that are to be deposited or fitted into, but they are sufficient for computation of most ProSHADE standard tasks as the shape is still almost identical.
In terms of this tutorial, since we have already computed the optimal rotation between two structures, we will continue to show how this result can be used to rotate a new structure. This will allow us to demonstrate the next functionality of ProSHADE in the later sections of this tutorial in a more streamlined fashion. To cause ProSHADE_data map rotation, the function in the example code can be used.
Computing the translation function
Similarly to the rotation function, the user may be interested in the optimal translation required to overlay two structures. ProSHADE can compute such an optimal translation using the translation function; however, in order to compute it, it requires that the two internal map representation have the same dimensions in terms of map indices and map sampling; identical map sampling is achieved by setting the changeMapResolution setting to true. Still, as the identical number of indices will not generally be the case, ProSHADE provides a padding function, which can add zeroes around the internal representation map to make sure that it has given dimensions. Therefore, in order to compute the translation function, it is required that the two structures are modified by the zeroPaddToDims() function to both have the same dimensions; the higher of the two structures are chosen in order to avoid loss of information.
Once the structures have the same dimensions, it is possible to compute the translation function. This function will compute the Fourier transforms of both maps, combine the Fourier coefficients and compute the inverse Fourier transform on the resulting combined coefficients map, thus obtaining the translation map. Once computed, this map can be accessed from the ProSHADE python module as shown in the following example code:
Also, similarly to the rotation function, ProSHADE provides a useful function for detecting the highest peak in the translation map and computing the corresponding translation in Angstroms. However, this translation is computed from the current position and not from the orginal starting position of the read in structure. To allow the user to properly apply the rotation and translation to the structure, ProSHADE provides a function that finds the translation function highest peak as well as the rotation centre and then outputs two vectors; one specifying the rotation centre, about which any rotations should be done and at which the structure should be centered before the second vector, translation to optimal overlay position is applied. The following code shows how both vectors can be obtained from ProSHADE.
Writing out resulting structures
Finally, it is worth noting that while the MAP formatted data can be written out of the ProSHADE_data object at any time (albeit their quality may be decreased if the rotation was applied as discussed in the rotating internal representation map section), ProSHADE can also write out the co-ordinate data for input structures, which were read in from a co-ordinate file. Please note that ProSHADE cannot generate co-ordinate data from maps, the co-ordinate data need to pre-exist ProSHADE run. Nonetheless, in the case of, for example, finding the optimal rotation and translation of one structure to overlay with another structure, the user may be interested in writing out the modified co-ordinates. To do this, ProSHADE contains the writePdb() function, which needs to be supplied with the file name, the required rotation and translation and it will write out the PDB file with these modifications applied.