GETAREA 1.0 beta

Solvent Accessible Surface Areas, Atomic Solvation Energies, and Their Gradients for Macromolecules

Surendra Negi, Hongyao Zhu, Robert Fraczkiewicz, Werner Braun

Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, TX 77555

Introduction

GETAREA is a web service provided by the Sealy Center for Structural Biology at the University of Texas Medical Branch. Originally, GETAREA was a subroutine for efficient analytical calculation of solvent accessible surface area and its gradient for proteins [1,2] implemented in our program FANTOM. This service allows a user to submit Cartesian coordinates of atoms in a molecule, stored in PDB format on her/his local disk, and to get back solvent accessible surface area (SASA) or solvation energy (depending on parameter setup) in a variety of formats. By default, the submission form is set up to calculate SASA of non-hydrogen atoms in proteins, but an appropriate change of input parameters will allow to compute any quantity proportional to SASA for any kind of molecule.

Input of atomic coordinates

Specify a path to your data file containing Cartesian coordinates in the format of Protein Data Bank. The only recognizable keywords are: ATOM, HETATM, TER and END. Lines starting with other keywords are ignored. Atom names must coincide with those defined in the atom library; otherwise an error will occur. It is especially true in the case of ambiguous atom positions marked with capital letters A, B, etc. A user must resolve this ambiguity by manual editing the PDB file, otherwise the calculated SASA would be meaningless. Input is terminated by the first encountered TER or END keyword. Other input options are self-explanatory.

Macintosh users: if you use a word processor to edit a PDB file, then save it as "MSDOS text". Submitting a file saved as regular "text file" will most likely produce an error message.

Examples:


The following output is produced for serine protease inhibitor (PDB code 1COA) with standard settings.


Job identifier: get_a_16884
Probe radius : 1.400

---------------------------------------------
POLAR area/energy = 1477.33
APOLAR area/energy = 2948.62
UNKNOW area/energy = 0.00
---------------------------------------------
Total area/energy = 4425.95
---------------------------------------------
Number of surface atoms = 316
Number of buried atoms = 196
Number of atoms with ASP=0 = 0

The same job with surface area output requested per residue:

Job identifier: get_a_16930
Probe radius : 1.400

Residue Total Apolar Backbone Sidechain Ratio(%) In/Out
MET 20 159.24 100.77 75.37 83.87 53.0 o
LYS 21 79.28 64.98 8.99 70.28 42.7
THR 22 67.67 45.64 0.97 66.70 62.8 o
GLU 23 73.21 20.44 1.93 71.27 50.5 o
TRP 24 0.00 0.00 0.00 0.00 0.0 i
PRO 25 74.82 63.67 11.15 63.67 60.5 o
GLU 26 90.68 74.36 18.43 72.24 51.2 o
LEU 27 3.33 3.33 0.00 3.33 2.3 i
VAL 28 69.90 56.54 13.37 56.54 46.2
GLY 29 50.40 32.33 50.40 0.00 57.8 o
LYS 30 101.48 77.54 4.37 97.11 59.0 o
SER 31 46.95 41.54 2.40 44.56 57.6 o
VAL 32 24.62 24.52 0.10 24.52 20.0
GLU 33 119.04 36.56 1.73 117.31 83.1 o
GLU 34 101.40 25.03 0.37 101.03 71.6 o
ALA 35 0.00 0.00 0.00 0.00 0.0 i
LYS 36 87.49 48.29 0.00 87.49 53.2 o
LYS 37 103.86 101.73 6.88 96.99 59.0 o
VAL 38 47.71 47.71 1.17 46.54 38.1
ILE 39 0.00 0.00 0.00 0.00 0.0 i
LEU 40 78.12 71.02 7.12 71.01 48.6
GLN 41 139.70 41.40 26.90 112.80 78.5 o
ASP 42 57.28 13.31 25.09 32.19 28.5
LYS 43 3.84 3.66 0.98 2.86 1.7 i
PRO 44 110.92 92.14 20.91 90.01 85.6 o
GLU 45 90.02 68.70 15.40 74.63 52.9 o
ALA 46 9.59 0.00 9.59 0.00 0.0 i
GLN 47 97.29 29.11 3.77 93.52 65.1 o
ILE 48 29.96 8.79 21.21 8.75 5.9 i
ILE 49 76.36 76.36 3.71 72.66 49.3
VAL 50 67.07 47.45 19.62 47.45 38.8
LEU 51 39.47 39.47 0.03 39.45 27.0
PRO 52 89.79 89.79 0.01 89.78 85.3 o
VAL 53 75.34 57.95 17.40 57.95 47.4
GLY 54 68.00 44.56 68.00 0.00 78.0 o
THR 55 60.48 47.95 20.96 39.52 37.2
ILE 56 172.42 155.82 21.45 150.98 100.0 o
VAL 57 59.65 46.84 17.25 42.40 34.7
THR 58 106.02 99.19 10.72 95.30 89.7 o
MET 59 203.93 182.91 32.34 171.58 100.0 o
GLU 60 102.74 42.72 12.14 90.60 64.2 o
TYR 61 160.35 115.56 29.16 131.19 67.9 o
ARG 62 128.12 46.60 4.14 123.98 63.4 o
ILE 63 108.89 108.89 0.00 108.89 73.9 o
ASP 64 32.13 15.45 0.19 31.94 28.3
ARG 65 30.99 10.92 0.00 30.99 15.9 i
VAL 66 0.00 0.00 0.00 0.00 0.0 i
ARG 67 41.75 6.28 1.58 40.17 20.5
LEU 68 0.36 0.36 0.36 0.00 0.0 i
PHE 69 33.09 32.25 0.84 32.25 17.9 i
VAL 70 8.44 6.86 1.58 6.86 5.6 i
ASP 71 60.40 32.46 13.01 47.39 41.9
LYS 72 199.41 133.00 37.66 161.75 98.3 o
LEU 73 144.83 128.90 20.94 123.89 84.7 o
ASP 74 76.93 19.22 0.16 76.78 67.9 o
ASN 75 39.89 4.92 0.00 39.89 34.9
VAL 76 0.79 0.03 0.76 0.03 0.0 i
ALA 77 42.58 24.05 25.49 17.09 26.3
GLU 78 80.35 31.51 4.60 75.74 53.6 o
VAL 79 42.59 35.04 7.55 35.04 28.7
PRO 80 3.51 3.51 3.46 0.05 0.0 i
ARG 81 127.86 59.40 2.70 125.16 64.0 o
VAL 82 15.67 1.74 13.93 1.74 1.4 i
GLY 83 7.92 7.54 7.54 0.38 9.1 i
---------------------------------------------
POLAR area/energy = 1477.33
APOLAR area/energy = 2948.62
UNKNOW area/energy = 0.00
---------------------------------------------
Total area/energy = 4425.95
---------------------------------------------
Number of surface atoms = 316
Number of buried atoms = 196
Number of atoms with ASP=0 = 0

The contributions from backbone and sidechain atoms are listed in the 5th and 6th columns respectively. The next column lists the ratio of side-chain surface area to "random coil" value per residue. The "random coil" value of a residue X is the average solvent-accessible surface area of X in the tripeptide Gly-X-Gly in an ensemble of 30 random conformations. Residues are considered to be solvent exposed if the ratio value exceeds 50% and to be buried if the ratio is less than 20%, marked as "o" and "i" respectively in the last column. The "random coil" values for 20 amino acids are:


Database of atomic types

Our basic goal was to make our service as flexible as possible. Therefore, in the "Advanced Options" section we made all the files that GETAREA depends on available to the user. In this section we describe the editable database of atomic types. The default database contains protein atom types and atomic solvation parameters (ASP) corresponding to the solvent accessible surface area of non-hydrogen atoms in squared angstroms. Comments start with a "#". Each data line contains a keyword describing a group of atom types. Keywords allow for defining global properties of atoms; in the default setting all atoms are divided into "polar", "apolar" and "unknown" groups. This affects output, since GETAREA will list SASA for each keyword. The following character strings "BB" and "SC" identify atom types as backbone and sidechain respectively. Identifiers are followed in data lines by atomic radii, whose units will determine units of the surface area. All radii, including radius of the water probe, must be in the same units. The next entry, atomic solvation parameters, are either unitless (for SASA calculations) or have units of energy/area. Changing the value of ASPs lets the user explore a wide variety of solvation models to calculate the value of solvation energy term and its gradient. ASP entry is followed by an atom type that allows to assign a radius and an ASP to every atom name specified in the library. Type PSEUD corresponds to pseudo atoms used as reference points in NMR structures. The "unknown" type UNKN_ must be always present. It allows GETAREA to continue calculations even when some atom names cannot be matched against their types. A warning message is issued in this case. Both the radius and the ASP for UNKN_ must be equal to zero. Users can modify or define their own database observing the following rules:

Examples:

An alternative ASP database classifies atoms by element and uses solvation model of Wesson and Eisenberg [5]:

# Key   ID Radius  ASP   Type            Comment
HYDROG BB 0.00 0.000 H_B_B # amid/ne H (backbone)
CARBON BB 2.00 0.012 C_A_B # aliphatic C (backbone)
CARBON BB 1.50 0.012 C_B_B # carbon C (backbone)
NITROG BB 1.50 -0.116 N_B_B # amide -NH- (backbone)
OXYGEN BB 1.40 -0.116 O_B_B # carboxyl O (backbone)
HYDROG SC 0.00 0.000 H_ALI # aliphatic H
HYDROG SC 0.00 0.000 H_ARO # aromatic H
HYDROG SC 0.00 0.000 H_AMI # amid/ne H
HYDROG SC 0.00 0.000 H_SUL # sulfur H
HYDROG SC 0.00 0.000 H_OXY # hydroxyl H
CARBON SC 2.00 0.012 C_ALI # aliphatic -CH2, -CH3
CARBON SC 1.50 0.012 C_BYL # carbon/xyl C
CARBON SC 1.85 0.012 C_ARO # aromatic =CH-, =CC-, =CN- or =CO-
NITROG SC 1.50 -0.116 N_AMI # amide -NH- -NH2
NITROG SC 1.50 -0.186 N_AMO # amine -NH3
OXYGEN SC 1.40 -0.116 O_BYL # carbonyl O
OXYGEN SC 1.40 -0.175 O_BYX # carboxyl O
OXYGEN SC 1.40 -0.116 O_HYD # hydroxyl O
SULFUR SC 1.85 -0.018 S_OXY # sulfur and thiol S of S-S bridge
SULFUR SC 1.85 -0.018 S_RED # sulfur and thiol SH or S-CH3
UNKNOW SC 0.00 0.000 PSEUD # pseudo atoms in NMR structures
UNKNOW SC 0.00 0.000 UNKN_ # unknown atom type; do not change!
Atomic solvation parameters are in kcal/mol/angstrom2. The same PDB input as in a previous section produces solvation energy in kcal/mol split into chemical element contributions:

Job identifier: get_a_17094
Probe radius : 1.400

---------------------------------------------
HYDROG area/energy = 0.00
CARBON area/energy = 34.64
NITROG area/energy = -56.58
OXYGEN area/energy = -114.81
SULFUR area/energy = -1.11
UNKNOW area/energy = 0.00
---------------------------------------------
Total area/energy = -137.86
---------------------------------------------
Number of surface atoms = 316
Number of buried atoms = 196
Number of atoms with ASP=0 = 0

Library of atom names

Atom library matches atom names with respective atom types; the latter must be consistent with ASP database entries described above. The default library contains protein atoms named according to the latest IUPAC recommendations [4] plus all PDB violations of these rules. The library is scanned by GETAREA to assign an appropriate keyword, radius and atomic solvation parameter from ASP database to every atom entry read from the supplied PDB file. It is built of residue blocks. To save space, the first residue block contains atoms common to all residues and all the remaining blocks contain only residue-specific atoms. The first block is scanned for atom name match regardless of the residue name, if no match is found GETAREA scans the block beginning with a given residue name. Atom names that cannot be found in the library are automatically associated with the type UNKN_ (see previous section); a warning message is issued in such a case. Users can modify or define their own library observing the following rules:

Examples:

In this example the library and ASP database are redefined to calculate the solvent accessible surface area of a molecule of trimethylamine oxide (TMAO). First, the x-ray structure of TMAO is expressed in PDB format:

COMPND    Trimethylamine oxide
ATOM 1 O TMAO 1 1.123 1.621 5.991
ATOM 2 N TMAO 1 1.970 0.633 5.482
ATOM 3 C1 TMAO 1 1.316 -0.014 4.322
ATOM 4 C2 TMAO 1 2.230 -0.377 6.539
ATOM 5 C3 TMAO 1 3.245 1.258 5.052
ATOM 6 H11 TMAO 1 1.889 -0.706 3.984
ATOM 7 H12 TMAO 1 1.149 0.630 3.630
ATOM 8 H13 TMAO 1 0.484 -0.399 4.609
ATOM 9 H21 TMAO 1 2.790 -1.053 6.149
ATOM 10 H22 TMAO 1 1.398 -0.770 6.812
ATOM 11 H23 TMAO 1 2.671 0.002 7.303
ATOM 12 H31 TMAO 1 3.850 0.596 4.715
ATOM 13 H32 TMAO 1 3.656 1.725 5.785
ATOM 14 H33 TMAO 1 3.031 1.883 4.355
END
Second, the new library of atom names is created:
# Atom names for TMAO
RESIDUE TMAO 14
O OXYGE
N NITRO
C1 CARBO
C2 CARBO
C3 CARBO
H11 HYDRO
H12 HYDRO
H13 HYDRO
H21 HYDRO
H22 HYDRO
H23 HYDRO
H31 HYDRO
H32 HYDRO
H33 HYDRO
Third, the corresponding ASP database is defined (radii values are used here for illustrative purposes only):
# Key   ID Radius  ASP   Type    #         Comment
HYDROG NO 0.50 1.000 HYDRO # hydrogen
CARBON NO 1.50 1.000 CARBO # carbon
NITROG NO 1.50 1.000 NITRO # nitrogen
OXYGEN NO 1.40 1.000 OXYGE # oxygen
UNKNOW NO 0.00 0.000 UNKN_ # unknown atom type; do not change!
With the most detailed output option GETAREA generates the following:

Job identifier: get_a_17047
Probe radius : 1.400

ATOM NAME RESIDUE AREA/ENERGY
1 O TMA 1 42.54
2 N TMA 1 0.00
3 C1 TMA 1 50.10
4 C2 TMA 1 50.26
5 C3 TMA 1 50.23
6 H11 TMA 1 12.27
7 H12 TMA 1 12.73
8 H13 TMA 1 12.47
9 H21 TMA 1 11.91
10 H22 TMA 1 12.55
11 H23 TMA 1 13.13
12 H31 TMA 1 12.59
13 H32 TMA 1 12.86
14 H33 TMA 1 12.12
---------------------------------------------
HYDROG area/energy = 112.62
CARBON area/energy = 150.58
NITROG area/energy = 0.00
OXYGEN area/energy = 42.54
UNKNOW area/energy = 0.00
---------------------------------------------
Total area/energy = 305.74
---------------------------------------------
Number of surface atoms = 13
Number of buried atoms = 1

Number of atoms with ASP=0 = 0

Nitrogen atom in TMAO is inaccessible to the 1.4-angstrom water probe.

References

  1. Fraczkiewicz, R; Braun, W. (1998) J. Comp. Chem., 19, 319.
  2. Fraczkiewicz, R; Braun, W. A New Efficient Algorithm for Calculating Solvent Accessible Surface Areas of Macromolecules, ECCC3; Nov. 1996; Northern Illinois Univ.
  3. Eisenberg, D; McLachlan, AD. (1986) Nature, 319, 199.
  4. Markley, JL; Bax, A; Arata, Y; Hilbers, CW; Kaptein, R; Sykes, BD; Wright, PE; Wuthrich, K. (1998) Pure & Appl. Chem., 70, 117.
  5. Wesson, L; Eisenberg, D. (1992) Protein Sci., 1, 227.

Last modified on Wed 17th April, 3:00 PM, 2015  by  Surendra S Negi