Drupal 7 security

Warning!  I am not a Drupal security expert.  I am not even a Drupal expert.  I have never even developed a website in Drupal.

I found myself in the position of needing to understand how Drupal 7 attempts to mitigate various web application security threats.  So I downloaded the Drupal Quickstart VM and armed with a copy of Pro Drupal 7 Development, a copy of the source code, and an internet connection, I started doing some research like a boss.  These are my notes on the topic.

How does Drupal 7 do ... ?

This answer on stackoverflow was the best one that I found:


  • Drupal's index.php file functions as a frontside controller. All page are piped through it, and the "actual" url/path the user requested is passed to index.php as a parameter.
  • Drupal's path router system (MenuAPI) is used to match the requested path to a given plugin module. That plugin module is responsible for building the "primary content" of the page.
  • Once the primary page content is built, index.php calls theme('page', $content), which hands off the content to Drupal's theming/skinning system. There, it's wrapped in sidebars/headers/widgets/etc..
  • The rendered page is then handed back to apache and it gets sent back to the user's browser. 
During that entire process, Drupal and third-party plugin modules are firing off events, and listening for them to respond. Drupal calls this the 'hook' system, and it's implemented using function naming conventions. The 'blog' module, for example, can intercept 'user' related by implementing a function named blog_user(). In Drupal parlance, that's called hook_user().
It's a bit clunky, but due to a PHP quirk (it keeps an internal hashtable of all loaded functions), it allows Drupal to quickly check for listeners just by iterating over a list of installed plugins. For each plugin it can call function_exists() on the appropriately named pattern, and call the function if it exists. ("I'm firing the 'login' event. Does 'mymodule_login' function exist? I'll call it. Does 'yourmodule_login' exist? No? How about 'nextmodule_login'?" etc.) Again, a touch clunky but it works pretty well. 
Everything that happens in Drupal happens because of one of those events being fired. The MenuAPI only knows about what urls/paths are handled by different plugin modules because it fires the 'menu' event (hook_menu) and gathers up all the metadata plugin modules respond with. ("I'll take care of the url 'news/recent', and here's the function to call when that page needs to be built...") Content only gets saved because Drupal's FormAPI is responsible for building a page, and fires the 'a form was submitted' event for a module to respond to. Hourly maintenance happens because hook_cron() is triggered, and any module with mymodulename_cron() as a function name will have its function called.
Everything else is ultimately just details -- important details, but variations on that theme. index.php is the controller, the menu system determins what the "current page" is, and lots of events get fired in the process of building that page. Plugin modules can hook into those events and change the workflow/supply additional information/etc. That's also part of the reason so many Drupal resources focus on making modules. Without modules, Drupal doesn't actually DO anything other than say, 'Someone asked for a page! Does it exist? No? OK, I'll serve up a 404.'
TIP!
Seems to me that if you want to know if a website is running Drupal (and they are not trying to hide it) then if there is a URL like:
http://my.domain.com/path1/path2/path3?name1=value1&name2=value2
Then if you change that to:
http://my.domain.com/index.php?q=path1/path2/path3&name1=value1&name2=value2
If you get the same page then it might mean the site is running Drupal.

Sessions
Every request to Drupal goes to index.php, the entire contents of which is:
define('DRUPAL_ROOT', getcwd());

require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
menu_execute_active_handler();
Now basically the first thing drupal_bootstrap (in /includes/bootstrap.inc) does is call _drupal_bootstrap_configuration which calls drupal_session_initialize (in /includes/Session.inc) and it's there that the session is created. To initialize the session though, there are a couple of global variables that need to be populated and this also happens in drupal_session_initialize.

The first value of interest is $cookie_domain which can be set in settings.php (a configuration file). Otherwise it is generated:
/includes/Bootstrap.inc:783
function drupal_settings_initialize() {
...
    // HTTP_HOST can be modified by a visitor, but we already sanitized it
    // in drupal_settings_initialize().
    if (!empty($_SERVER['HTTP_HOST'])) {
        $cookie_domain = $_SERVER['HTTP_HOST'];
        // Strip leading periods, www., and port numbers from cookie domain.
        $cookie_domain = ltrim($cookie_domain, '.');
        if (strpos($cookie_domain, 'www.') === 0) {
            $cookie_domain = substr($cookie_domain, 4);
        }
        $cookie_domain = explode(':', $cookie_domain);
        $cookie_domain = '.' . $cookie_domain[0];
    }
...
The code will convert the HTTP_HOST value into a domain, and strip any leading ‘www.’. Interestingly the ‘www.’ stripping is case sensitive.

Next there is the value of session_name() (which is the PHP method to get/set the name PHP uses for the session cookie). The variable $session_name is set to $cookie_domain if $cookie_domain is set in settings.php, otherwise it is set to the HTTP_HOST value. $session_name is then used to set the session_name() used by PHP i.e. the name of the session/login token cookie:
/includes/Bootstrap.inc:810
function drupal_settings_initialize() {

     $prefix = ini_get('session.cookie_secure') ? 'SSESS' : 'SESS';
     session_name($prefix . substr(hash('sha256', $session_name), 0, 32));
}
The value of session.cookie_secure depends directly on the value of $is_https which is:
/includes/Bootstrap.inc:741
function drupal_settings_initialize() {
...
     $is_https = isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on';

}
So for the domain "example.dev", we end up with a session or login cookie like (for HTTP):
SESSeffb83f963264c367f7e224fe805e0ae=JF-Qqk0zQZDj4Wu_Pceu901ko8SkWVxxAVVPVp1LgJQ
For HTTPS it would be:
SSESSeffb83f963264c367f7e224fe805e0ae=JF-Qqk0zQZDj4Wu_Pceu901ko8SkWVxxAVVPVp1LgJQ
Where effb83f963264c367f7e224fe805e0ae = the first 32 hex characters of SHA256("example.dev").
It’s relevant to explicitly point out that the name (not the value!) of the session cookie is not a secret (despite it looking random) as it's based on the domain name of the web site.

Now that Drupal knows the name of the cookie where it can get the session identifier, it is going to try and use this value to look for an existing session in the DB and populate the global $user variable (which happens in _drupal_session_read (in /include/Session.inc) which PHP invokes when session_start() is called from drupal_session_start).  Getting back to drupal_session_initialize, this will check for a cookie called session_name(), and attempt to populate the global $user variable by searching the 'user' DB table for an entry with that session identifier.  If over HTTPS and no existing session was found using the secure session cookie value, then the insecure session cookie value will be used to look up the session, but only for users that are anonymous.

If no session cookie exists then a new session cookie will be created via:
/includes/Session.inc:266
function drupal_session_initialize() {

     session_id(drupal_hash_base64(uniqid(mt_rand(), TRUE)));
     if ($is_https && variable_get('https', FALSE)) {
         $insecure_session_name = substr(session_name(), 1);
         $session_id = drupal_hash_base64(uniqid(mt_rand(), TRUE));
         $_COOKIE[$insecure_session_name] = $session_id;
     }

}
Note, when the anonymous user authenticates they will receive a new session cookie value that is generated using data from drupal_random_bytes.

User Login
Drupal comes with a User module (/modules/user/User.module), and in that a block called user_block that allows the login block to be added to a page. The login block contains the user_login_block form.
/modules/user/User.module:1288
function user_login_block($form) {
     $form['#action'] = url(current_path(), array('query' => drupal_get_destination(), 'external' => FALSE));
     $form['#id'] = 'user-login-form';
     $form['#validate'] = user_login_default_validators();
     $form['#submit'][] = 'user_login_submit';
...
}
Where the validation functions are:
function user_login_default_validators() {
     return array('user_login_name_validate', 'user_login_authenticate_validate', 'user_login_final_validate');
}
user_login_name_validate ensures the username is not empty and the username does not correspond to a blocked user (status = 0 in the users DB table).

user_login_authenticate_validate checks the following:
  • Neither username or password is empty
  • The number of failed attempts to login from the users IP has not exceeded a certain limit (by default 50 in 1 hour)
  • The number of failed attempts to login as that user has not exceeded a certain limit (by default 5 in 6 hours). The user can be identified by either UID (from username match in DB users table) or UID/IP combination.
  • The hash of the user’s password against the hash in the DB
    • Hashes are checked using the _password_crypt function (/includes/Password.inc:152). If the DB hash starts with ‘$S$’ then the algorithm will be SHA256, if it starts with ‘$H$’ or ‘$P$’ is will use MD5.
    • _password_crypt uses the first 12 characters of the DB hash as settings information. The first and third characters should be ‘$’, the fourth is the log2 iteration count (encoded as a character) and characters 5-12 are an 8 character salt.
    • _password_crypt then hashes the salt and user password and runs the resulting hash through the hash algorithm for the required number of iterations, finally prepending the settings string and returning.
  • If the password validates, it may also be upgraded to use the latest password hashing mechanism if it currently uses an old one.
user_login_final_validate ensures that if authentication was not successful then various counters protecting against too many login attempts are updated and the appropriate error message is displayed to the user.

If validation is successful then the form’s #submit function is invoked:
/modules/user/User.module:2237
function user_login_submit($form, &$form_state) {
     global $user;
     $user = user_load($form_state['uid']);
     $form_state['redirect'] = 'user/' . $user->uid;

     user_login_finalize($form_state);
}
The user_login_submit function loads the user information, sets the redirect and in user_login_finalize calls drupal_session_regenerate and then invokes any hook_user_login that have been defined in other modules.

Tip
The redirect to user/uid is interesting, it implies you could determine the current number of registered users on a site (by querying /user/X where X is a number set using a binary search, as you get an access denied page for existing users and a page not found for users who don’t exist). This also tells you the uid of the next user who joins. It also reveals the user’s own uid (as the correct X for users/X will return their account details).

Mixed Mode HTTP/HTTPS
By default, Drupal treats requests for HTTP and HTTPS pages as if they come from totally separate users, it is comparable to running 2 completely different websites at the same time, one over HTTP and one over HTTPS. It is possible to log a user in over HTTP and log a completely different user in over HTTPS, and switch between them seamlessly.

Often though, we want to run 1 website over a mix of HTTP and HTTPS, and have the site understand there is 1 user requesting pages in different modes. There are 2 main things we care about:
  1. How does Drupal handle user sessions (so requests to HTTP pages don’t expose session tokens that can do sensitive actions)?
  2. How do we force certain pages to only be served over HTTPS?
With regards to sessions, Drupal will set the session cookie name based on whether the user logs in over HTTP or HTTPS:
/includes/Bootstrap.inc
function drupal_settings_initialize() {
...
     $is_https = isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on';
...
     if ($is_https) {
         ini_set('session.cookie_secure', TRUE);
     }
...
     $prefix = ini_get('session.cookie_secure') ? 'SSESS' : 'SESS';
     session_name($prefix . substr(hash('sha256', $session_name), 0, 32));
}
[Setting session.cookie_secure tells PHP to set the secure flag on the session cookie.]

So if a user logs in over HTTP then the cookie has a SESS prefix and visiting a HTTPS page will cause Drupal to treat the user as anonymous (as Drupal will look for a session cookie prefixed with SSESS for HTTPS requests). Similarly, if the user logs in over HTTPS the cookie has a prefix of SSESS and if the user visits any HTTP links then Drupal will treat them as an anonymous user (as Drupal will look for a session cookie with a prefix of SESS for HTTP requests).

To support mixed-mode HTTPS and HTTP sessions open up /sites/default/settings.php and add $conf['https'] = TRUE;. This enables you to use the same session over HTTP and HTTPS both -- but with two cookies (prefixed with SESS and SSESS) where the HTTPS cookie (prefixed with SSESS) is sent over HTTPS only (secure flag is set).

This gets set up when the user logs in. The user login form will invoke the validate hook first, which will check the credentials, after this the submit hook is called which calls drupal_session_regenerate. 
/includes/Session.inc:350
global $user, $is_https;
    if ($is_https && variable_get('https', FALSE)) {
        $insecure_session_name = substr(session_name(), 1);
        $session_id = drupal_hash_base64(uniqid(mt_rand(), TRUE) . drupal_random_bytes(55));
        $expire = $params['lifetime'] ? REQUEST_TIME + $params['lifetime'] : 0;
        setcookie($insecure_session_name, $session_id, $expire, $params['path'], $params['domain'], FALSE, $params['httponly']);
        $_COOKIE[$insecure_session_name] = $session_id;
    }
...
    session_id(drupal_hash_base64(uniqid(mt_rand(), TRUE) . drupal_random_bytes(55)));
...
    if (isset($old_session_id)) {
        $params = session_get_cookie_params();
        $expire = $params['lifetime'] ? REQUEST_TIME + $params['lifetime'] : 0;
        setcookie(session_name(), session_id(), $expire, $params['path'], $params['domain'], $params['secure'], $params['httponly']);
... 
The first section of code checks if we are on HTTPS and the ‘https’ variable is set (which is a config change typically done in settings.php), and if so then the insecure session cookie is set. The middle section of code sets the session identifier (regardless of HTTP or HTTPS). The final section of code sets the session cookie value (to the session identifier), given the existence of an existing session, which Drupal uses to set the session cookie properties (which may have been previously configured, such as the value of session.cookie_secure).

With regards to forcing certain pages to only be served over HTTPS, Drupal currently doesn’t offer such functionality, but there is a module called “Secure Pages” (http://drupal.org/project/securepages) which does. This module contains a list of paths to secure and any form that is built is checked against this path and has its #https property set to true, which alters the form so it will be submitted over HTTPS.

Reference: http://drupal.org/https-information
 
CSRF
Drupal automatically applies CSRF to any form built using its Form API, unless the requesting user is anonymous. The resulting form sent to the browser looks like:
<form action="/node" method="post" id="test-form" accept-charset="UTF-8">
     <div>
     ...
         <input type="submit" id="edit-submit" name="op" value="Submit" class="form-submit" />
         <input type="hidden" name="form_build_id" value="form-gWlA9KHqT2PvbxVfEH6ddVd9JgfphV7XCkHyYV0WwuU" />
         <input type="hidden" name="form_token" value="w7PPGmYkgxrrwMRFOoRODgJCbfm4JVd6kzvprASskWI" />
         <input type="hidden" name="form_id" value="test_form" />
     </div>
</form>
There are 3 hidden fields; form_build_id, form_token and form_id. The field form_token is used to protect against CSRF. Let’s see how it is created.

/includes/Form.inc:990
function drupal_prepare_form($form_id, &$form, &$form_state) {
...
    if (!empty($user->uid) && !$form_state['programmed']) {
        // Form constructors may explicitly set #token to FALSE when cross site
        // request forgery is irrelevant to the form, such as search forms.
        if (isset($form['#token']) && $form['#token'] === FALSE) {
            unset($form['#token']);
        }
        // Otherwise, generate a public token based on the form id.
        else {
            $form['#token'] = $form_id;
            $form['form_token'] = array(
'#id' => drupal_html_id('edit-' . $form_id . '-form-token'),
'#type' => 'token',
'#default_value' => drupal_get_token($form['#token']),
            );
        }
    }
...
}

So if the server is responding with a form to an authenticated user (as UID cannot be 0, as only anonymous have UID 0), and the form is not being created in response to a server-side drupal_submit_form call (so form_state['programmed'] is TRUE), and the creator of the form did not explicitly disable CSRF (by setting $form['#token'] = FALSE), then the CSRF token will be created via drupal_get_token.
/Include/Common.inc:4968
function drupal_get_token($value = '') {
     return drupal_hmac_base64($value, session_id() . drupal_get_private_key() . drupal_get_hash_salt());
}
Note, drupal_hmac_base64 uses PHP’s hash_hmac function with the hash algorithm set to SHA256.
So the token is created by creating an HMAC of the form_id (which is chosen by the developer and is public since it's a hidden field of the form) using a key consisting of; the session identifier, the Drupal installation private key and the Drupal hash salt. Let’s look at each of these:
/Include/Common.inc:4954
function drupal_get_private_key() {
     if (!($key = variable_get('drupal_private_key', 0))) {
         $key = drupal_hash_base64(drupal_random_bytes(55));
         variable_set('drupal_private_key', $key);
     }
     return $key;
}
So the drupal_private_key variable is stored in the DB, unless it doesn’t exist in which case it is generated. drupal_random_bytes (in Bootstrap.inc:1926) will use /dev/urandom (if it’s available) or openssl_random_pseudo_bytes (if the PHP version >= 5.3.4).
/Include/Common.inc:4941
function drupal_get_hash_salt() {
global $drupal_hash_salt, $databases;
     // If the $drupal_hash_salt variable is empty, a hash of the serialized
     // database credentials is used as a fallback salt.
     return empty($drupal_hash_salt) ? hash('sha256', serialize($databases)) : $drupal_hash_salt;
}
So the value of $drupal_hash_salt is able to be explicitly set in settings.php, however if it isn'tset  then it is created by serializing the $databases array (which details the database connection properties) that is also defined in settings.php.

At this stage it’s worth noting that the Drupal private key and hash salt are the same for all users. This means the variation in the form_token value comes only from the session idenitifer.
/Includes/Session.inc:370
function drupal_session_regenerate() {

    session_id(drupal_hash_base64(uniqid(mt_rand(), TRUE) . drupal_random_bytes(55)));

}
drupal_session_regenerate is called whenever a user switches between being unauthenticated and authenticated (and vice-versa).

So the security of this CSRF approach comes down to how effective the random numbers generated are and whether or not the session identifier, Drupal private key and hash salt for the Drupal installation are exposed in any way.

Access Control
Drupal uses Permissions and Roles. The admin screens allow Roles to be created and allow Users to be assigned to the Roles (a User can be assigned to more than one Role).  Every Module in the system can define Permissions which are available to assign to a Role in the admin screens

From the Admin screens (sorry no screenshots):
    People->Permissions(tab)->Roles(tab) - This lets an admin create a new Role.
    People->List - This lets an admin add or remove a Role from a user. Users may have multiple Roles.
    People->Permissions(tab)->Permissions(tab) - This lets an admin set which permissions a Role has.

A module defines a new permission by implementing hook_permission.

When writing a module, in order to enforce that the user has a permission to access a certain menu item, certain properties need to be set e.g.
$items['path/to/module'] = array(
     'title' => 'MyModule',
     'description' => 'An accessed controlled module.',
     'access callback' => 'user_access',
     'access arguments' => array('my module permission'),
);
The ‘access callback’ property is the name of the function that will be called to check the user has permission to access the page at 'path/to/module'.  That function should return a BOOL indicating if the user should be granted access.  The ‘access callback’ function will be passed the value of 'access arguments' (which should contain a single permission name). If ‘access callback’ is not specified then the function user_access (/modules/user/User.module:786) is used by default.

For consistency, even if a custom ‘access callback’ function is used, that function should still be calling user_access with a named permission as the basis of making an access control decision.

Dynamic Code Execution
Drupal has a function called php_eval (which calls PHP eval).

I haven't tested it but apparently the code passed to php_eval must be surrounded by <?php ?> tags and this ensures the evaluated code cannot access any variables in the calling code (although I would assume globals are still available).

Output Encoding
Drupal differentiates between a couple of different scenarios.
  • Output encoding for non-HTML text (to protect against XSS).
Drupal makes available the check_plain function:
includes/Bootstrap.inc:1571
function check_plain($text) {
    return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
}
check_plain simply calls the PHP function htmlspecialchars.

Developers can call this function (check_plain) directly as in:
drupal_set_message(“You favourite colour is “.check_plain($colour));
However, since for most text we want to support translations, most text will make use of the t() function. The t() function though will callcheck_plain automatically if it is invoked with text placeholders, however the prefix of the placeholder dictates exactly what happens:

@ prefix tells t() to run all text replacements through check_plain.
      drupal_set_message(t(“You favourite colour is @colour“, array(“@colour” => $colour)));
is equivalent to
drupal_set_message(t(“You favourite colour is “.check_plain($colour)));
 % prefix tells t() to run all text replacements through drupal_placeholder, but drupal_placeholder will first run the text through check_plain. (drupal_placeholder returns '<em class="placeholder">' . check_plain($text) . '</em>')
drupal_set_message(t(“You favourite colour is %colour“, array(“%colour” => $colour)));
is equivalent to
drupal_set_message(t(“You favourite colour is “.'<em class="placeholder">' . check_plain($text) . '</em>')));
 ! prefix tells t() NOT to run the text through check_plain.
drupal_set_message(t(“You favourite colour is !colour“, array(“!colour” => $colour)));
is equivalent to
drupal_set_message(t(“You favourite colour is $colour”)));
Note, in PHP, in-string variable substitution is only done if the string is quoted in double-quotes.  No variable substitution occurs if the string is quoted in single-quotes.
  • Output encoding for HTML text (to protect against XSS).
Drupal has the filter_xss function to support output encoding text that contains some HTML mark up. The function takes an array of allowed tag names, but the default ones are 'a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'.
  • Output encoding for HTML text provided by an administrator (to protect against XSS).
Drupal has the filter_xss_admin function which just calls filter_xss with the following list of tags - 'a', 'abbr', 'acronym', 'address', 'article', 'aside', 'b', 'bdi', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'command', 'dd', 'del', 'details', 'dfn', 'div', 'dl', 'dt', 'em', 'figcaption', 'figure', 'footer', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr', 'i', 'img', 'ins', 'kbd', 'li', 'mark', 'menu', 'meter', 'nav', 'ol', 'output', 'p', 'pre', 'progress', 'q', 'rp', 'rt', 'ruby', 's', 'samp', 'section', 'small', 'span', 'strong', 'sub', 'summary', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'time', 'tr', 'tt', 'u', 'ul', 'var', 'wbr'.This is basically everything but ‘script’, ‘object’, ‘style’ and ‘iframe’. Still, it’s good it’s a whitelist.
  • Generating anchor tags using l() (to protect against XSS).
l() calls check_plain on the $path and $text arguments.
  • Generating URLs that are well formatted.
Drupal has the drupal_encode_path function. This calls PHP’s rawurlencode and then URL encodes forward slashes.
  • Generating JSON that is well formatted.
Drupal has the drupal_json_encode function that converts a PHP variable into its Javascript equivalent.

Cookies
Cookies can be set using the PHP function setcookie.

Cookie can be read from the PHP superglobal $_COOKIES associative array.

Headers
Headers can be set using drupal_add_html_head.

Headers can be read from the PHP superglobal $_SERVER associative array.


Redirects
Drupal provides the drupal_goto function. This function preferentially uses the ‘destination’ request parameter (but this is limited to on-site redirects), otherwise redirects to the path passed in.

2 comments:

  1. drupal_set_message(t(“You favourite colour is $colour”))); is equivalent if you only consider html escaping.

    The t() placeholder option would give you one translatable string "Your favourite colour is !colour" whereas the string interpolation would require you to translate all potential colour choices from your users.

    Your favourite colour is blue
    Your favourite colour is green
    Your favourite colour is mauve
    ... ad nauseam

    Note the html flag in the l() options array; when true, text is not escaped.

    ReplyDelete
  2. Wow- this was extremely helpful for a system builder like me :).

    ReplyDelete