|Last modified: 14-11-2012|
Quick & Dirty Guide to Apache
There doesn't seem to be a really good book on Apache, ie. what "DNS & BIND" is to DNS or the Red book to Unix administration. The least bad I found is O'Reilly "Apache: The Definitive Guide, 2nd Edition" by Ben Laurie & Peter Laurie, but it already dates back to February 1999. Other introductory books include "Linux Apache Web Server Administration" by Charles Aulds (Craig Hunt Linux Library), "Apache Server Bible" by Mohammed J. Kabir, and "Professional Apache" by Peter Wainwright.
ServeName tells Apache which hostname to return in queries from browsers
DocumentRoot: By default, Apache looks for htdocs/ under ServerRoot, but this setting allows you to set up the document root directory elsewhere
To outsource part of the configuration, take advantage of AccessConfig and Include, especially for virtual hosting:
Per-directory configuration files are set with AccessFileName
Conditional configuration can be done using command-line switches such as: httpd -D UseRewrite
- OR -
Note: If you want to use other command-line switches, you must kill httpd; apachectrl restart or HUPping httpd won't work.
Three levels of configuration:
When using the latter, AllowOverride is highly recommended.
The different available containers: <Limit> and <LimitExcept>, <Directory> (physical path in filesystem), <Files> (same, but dealing with specific files), <Location> (URL), <VirtualHost>.
Two kinds of directives: Those that are server-level only, and those that can be either general- or local-level, ie. they can be locally-overriden. For instance:
Options are inherited, so use + or - to add or remove them, eg. Options -FollowSymLinks
To tell Apache which file to return when a user aims at a directory and not a specific file, use the DirectoryIndex. For instance:
To set icons and descriptions, use this:
Different types of variables are available:
Use mod_env to set variables that can be read by CGI scripts:
Use mod_setenvif to set variables conditionnally through BrowserMatch/BrowserMatchNoCase (eg. BrowserMatch Mozilla netscape=true) and SetEnvIf (eg. SetEnvIf User-Agent Mozilla netscape=true).
In response to clients, some headers are sent along with the document such as HTTPstatus with response code, Content-Type header, and (optional) one or more HTTP response headers. Some of those headers are Cache-Control or Expires.
To return additional headers, use mod_header, eg. Header set/unset item value.
To send information on when a document should expire in a caching server, use mod_expires:
If you want to create HTML documents with its own headers, ie. tell Apache not to send any header information itself, use mod_as_is:
Apache comes with a bunch of modules. Just like third-party modules below, standard modules can be built statically so they are compiled inside the httpd binary, or built dynamically, in which case they exist as independ .so files.
AddModule: Used to load static modules in an order different from the one in which they were compiled in httpd.
ClearModuleList: Unload all static modules. Those needed must be reloaded with AddModule.
LoadModule: Used to load modules, static or dynamic. Must be located before ClearModuleList.
To enable modules statically, use "--enable-module=mymodule". To enable most or all standard modules, use (you guessed it) "--enable-module=most" or "--enable-module=all".
To enable a module dynamically, use "--enable-module=mymodule --enable-shared=module". To enable all standard modules dynamically, use "--enable-shared=max". You can always tell Apache not to load some of them by commenting them out in httpd.conf.
Note that when building any module dynamically, Apache will include the mod_so module automatically. If mod_so in not compiled, however, neither is the apxs binary that is needed to add dynamic third-party modules (which makes sense: Without mod_so, no dynamic modules can be loaded.)
To disable those that you don't need, use "--disable-module=mymodule". If you now that you will never need to use dynamic modules, you can remove the mod_so module with "--disable-module=so".
The order in which modules are loaded is significant, since a URL can be handled successively by different modules.
Static modules are loaded in the order in which they were compiled (use "httpd -l" to check), but this can be changed using the ClearModuleList and AddModule settings in httpd.conf. Note that ClearModuleList really does unload all modules, so make sure you add those you need with AddModule.
Dynamic modules are loaded with the LoadModule setting in reverse order (ie. starting from the last LoadModule setting).
There are two ways to build third-party modules: APACI or APXS. APACI is run with ./configure, is OK when the module consists in a single file, and is required when the module needs to patch the Apache source code (eg. mod_ssl). APXS is a binary file that is compiled by Apache when mod_so is compiled in, and lives under the bin/ sub-directory. APXS is used when the module consists in mutiple source files, and requires the Apache header files. Note that since it is compiled along with Apache, the apxs binary contains host-specific information such as LIBEXECDIR, etc.
The source code of the module must be copied in the src/modules/extra/ sub-directory. You can either copy the source file yourself and run the --activate-module=src/modules/extra/mymodule.c, or run the --add-module=/home/mymodule.c once: --add-module will copy the file to the right location, and tell Apache to include it when compiling.
Using either --activate-module or --add-module compiles a third-party module statically. If you want this module to be built as a dynamic module, add the familiar --enable-shared switch, eg. ./configure --activate-module=src/modules/extra/mymodule.c --enable-shared=mymodule .
Apxs is used to build dynamic modules. To compile and install a module, and add the required LoadModule/AddModule (I thought AddModule was only used to use static modules?) settings in httpd.conf: apxs -c -i -a mod_mymodule.c . Additional options are -n (to indicate the name of the module if it can't be infered from the name of its source file), and -A (add the ad hoc LoadModule/AddModule settings, but comment them out.)
The following compiles PHP as a static module (as shown by the --with-apache switch). I chose PHP, but the procedure is identical for other modules, and involves copying compiled binaries into Apache's source tree so they are included when compiling Apache itself:
Here's how to build PHP and mod_perl as dynamic modules (DSO). Once again, I chose PHP and mod_perl as examples, but you will have to proceed the same way to compile and launch any DSO module. You can tell whether your version of Apache supports DSO modules by running httpd -l, which should return mod.so:
Here's how to build Apache with SSL, mod_SSL, mod_Perl, and PHP as static modules. I'll assume that OpenSSL is already installed through RPM in order to connect to the server through OpenSSH.
To relate handlers to URLs, Apache provides two directives: SetHandler and AddHandler. SetHandler is more basic, as it causes all files in or below its location to be interpreted with the specified handler; AddHandler is more flexible: It relates access to a given media/MIME type or file extension to a CGI script.
Apache can act as a proxy server through mod_proxy. To enable proxying: ProxyRequests on, eg.
- Listen 8080
- <virtualhost 192.168.1.1:8080>
- ServerName .net
- ProxyRequest on
Mirroring: Proxypass/Linuxdocs http://www.linuxdoc.org
Reverse proxy : outside -> proxy -> inside www
To expire cached pages: mod_expires + HTTP header (expire/1.0 & cache_control/1.1)<¨> To block access : ProxyBlock gambling sex
The connect method: allow connections to a remote server
kHTTPd : HTTPd in kernel
CGI: interface to run any program. Must either end in .cgi or be located in cgi-bin directory. To define a handler for cgi scrips: ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin". Watch out for permissions, eg. nobody.www 760
cgi.pm : big Perl module to perform web-related tasks + manage cgi interface to the www server
FastCGI: scripts loaded during HTTPd start
Provides the cgi-script directive to execute a URL as as CGI script, and requires the Options ExecCGI directive.
Provides the imap-file directive to interpret URLs as imagemap
Provides the send-as-is directive to send files without additional headers. The file is responsible for carrying its own headers to be interpreted correctly
Provides the server-info directive. It generated an HTML page of server configuration. For security reasons, you should enforce access restriction
Provides the server-parsed directive to parse files for server-side includes. Requires the Options Includes directive.
Provides the server-status directive to generate an HTML page of server status. For security reasons, you should enforce access restriction
Provides the type-map directive to interpret URLs as a type map for content negotiation
mod_perl : better perfs. Apache::Registry module -> if script hasn't changed, no recompile.
mod_perl requires a lot of Perl modules -> #cpan ; install mod_perl
For name-based virtual hosts, if no name matches, the user is redirected to the primary virtual user. Note that the IP address used for name-based virtual hosting is not available to the primary server.
To avoid being dependent on DNS resolution, alawya specify IP address + servername
httpd -S parses the configuration file and dumps its output.
It's possible to define a default virtual server (* = all ports)
You might want to forbid search engines from indexing all or part of your site. This can be achieved by using either a robots.txt at the highest level in the DocumentRoot directory, HTML tags in each HTML document, and directives in httpd.conf. Note that robots.txt implies that you trust search engines to follow its instructions.
Apache can be set up to run a CGI script when a client requests certain types of resource, a file extension, or the HTTP method. All of these use the Action or Script directive supplied by the mod_action module. Here's how to set up those three handlers:
The value returned by the Content-Type instruction when a user retrieves a document can be defined in different ways. By default, Apache reads this information from a two-column file usually called mime.types (this can be changed with the TypesConfig directive), eg. "text/html html htm". This type-extension association can also be set in the client browser so it knows how to handle a file if the server didn't return a Content-Type which it understands.
The information contained in mime.types can be supplemented by editing httpd.conf and adding the directives AddType or Action. For instance, "AddType text/mylanguage .myl .mylanguage" or "Action image/gif /cgi-bin/process-gif-image.cgi". The latter lets a script handle the request instead of sending a file directly.
If a file is encoded (eg. ZIP file, BinHex file, etc.), use the AddEncoding directive so that Apache will send a Content-Encoding header:
AddEncoding zip .zip
... will result in the following header:
Conversely, four HTTP headers can be sent by the client to tell Apache which files it can handle: Accept, Accept-Charset, Accept-Encoding, and Accept-Language. Not all browsers use those correctly, so Apache needs to do a bit of guessing. More sophisticated content negotiation can be achieved with MultiViews and type maps.
To avoid supplementing mime.types, you can use the mod_mime_magic module which tries to guess the type of a file by looking for patterns inside it, just like the "file" command works in Unix. Obviously, this is more CPU- and disk-intensive, so mod_mime_magic is usually loaded after the lighter mod_mime module.
Error codes are organized over 5 categories:
Apache provides the ErrorDocument directive so you can customize how Apache responds to errors. Some examples (note the absence of quotes in the first line):
If the client requests a document that doesn't exist, you can let Apache either try to rewrite the URI or redirected the client to another URI. Rewriting can be done by editing httpd.conf, either through some basic directives or by more advanced directives offered by two modules: mod_alias and mod_rewrite; the latter is more sophisticated but is much larger.
The main directive offered by mod_rewrite is RewriteRule, which is a more powerful alternative to AliasMatch. The real beauty of mod_rewrite is that it supports flow control:
More information available at http://httpd.apache.org/docs/mod/mod_rewrite.html
Here's how to redirect the user to a new domain with the same docs/ tree:
R=301 tells the browser that it's being redirected, which is a convenient way to tell search engines to update the URLs to point to the new domain.
An easier alternative in case the old and new servers have the exact same repository tree:
ie. replace the root of the URL with the new domain, and combine this with the rest of the URL, eg. http://www.old.com/mydoc.html is turned into http://www.new.com/mydoc.html .
Nothing to do with the IMAP mail protocol. This is used to create clickable images, and redirect the client to a different URL depending on where the user clicked on the image.
This only performs basic rewriting, and is enabled by using the "CheckSpelling on" directive.
Technically speaking, CGI is not a programming language but a protocol for scripts to gather information from a user request and respond accordingly. Scripts can be shells scripts, Perl or Python scripts, and binaries. Scripts retrieve information sent by the browser through either environment variables (GET method) or standard input (POST method.) The GET method has the advantage that users can bookmark the URL, but it also makes it easier for hackers to play tricks; The POST method is safer in that respect.
When using the GET method, the main variables that scripts can read are: REQUEST_METHOD, PATH_INFO, PATH_TRANSLATED, QUERY_STRING, and SCRIPT_NAME. Here's a simple way to print out all environment variables:
Since CGI scripts are run through Apache, they can be a pain to debug. A quick and dirty way is to set environment variables from the shell, and call the script:
Using GET has the advantage that users can bookmark the URL, but those are limited to 256 characters. PUT is required when sending more data, but makes it impossible for the user to save the URL with all its parameters.
You need to configure Apache so it knows which directories contain CGI scripts, and which file extensions they use. CGI scripts are handled by the mod_cgi module, and the ScriptAlias and ExecCGI directives. ScriptAlias is useful to restrict CGI scripts to a single directory, outside DocumentRoot, so as to forbid users from uploading CGI scripts. You can have as many ScriptAlias directives as you wish.
Security can be enhanced by forbidding the use of .htaccess files in a CGI-only directory:
To restrict the use of CGI scripts yet more, you can restrict this to a single file using the Files directive:
As an alternative to using a SetHandler directive as above, use either "AddHandler cgi-script .cgi .exe" and a ExecCGI option, or MIME types using mod_cgi's "AddType application/x-httpd-cgi .cgi"
SSI is short for Server-Side Include, and is provided by the mod_include module. To tell Apache to handle SSI:
An alternative to Includes is IncludesNOEXEC, which disables any command that causes script execution.
Using the FastCGI mod_fastcgi module which provides the fastcgi-script, scripts are run persistently, ie. no need for Apache to set up the script environment, start up the script, etc. Under FastCGI, scripts can have three roles (Responder: like regular CGI scripts, Filter: Convert between input and output media type, and Authorizer: to authenticate HTTP requests and users, and can be used with mod_auth and mod_auth_dbm) and three types (Dynamic: started when the URL is first accessed, Static: started up with Apache, and External: located on a different host). Apache talks to FCGI scripts through a socket.
You can change the behavior of FCGI scripts through the command line. For instance, here's how to tell Apache to restart FCGI scripts that exit after 10 seconds and to restrict scripts to 5 instances at any one time: FastCgiConfig -restart -restart-delay 10 -maxprocesses 5 . Here's how to start a static FCGI script: FastCgiServer /cgi-bin/askname.cgi -init-start-delay 5 . And here's how to tell Apache to run an external FCGI script on a remote host: FastCgiExternalServer /cgi-bin/external.fcgi -host fcgi.alpha-prime.com:2001 .
Here's a basic FCGI script:
And here's how to use a FCGI script to authenticate users:
Finally, FastCGI scripts can be run through a wrapper, either SuExec (default: FastCgiSuexec on), or another binary (FastCgiSuexec /path/to/different/wrapper).
If Apache is compiled with suExec enabled, CGI scripts run under a different user account from the one used by the main server (ie. not root.) To compile suExec in, run "./configure --enable-suexec --suexec-caller=nobody (--suexex-caller is a user account that is allowed to call suExec, which should be the account set by the User directive in httpd.conf. It is "www" by default.)
If you've compiled apache several times, with different options, and you can no longer compile successfuly, just rm -Rf the source tree, and start from a clean install.
Run httpd -t
"apachectl restart" closes all active connections. "apachectl graceful" waits for existing connections to close
Since it is one of the few HTTP methods that are available since release 0.9 of the HTTP protocol, you can connect in telnet mode to the TCP port on which Apache is listening, and issue "GET /" without the quotes.
Is enable-module used to activate the use of third-party modules whose source has been included in the Apache source tree (eg. SSL), while other third party modules which do not need to patch the Apache source first need to be set with activate-module?
Are activate-module (copy source into Apache's directory, and include module) and add-module (assume source was copied beforehand, and just include it) used for standard Apache modules, while enable-module is used for third-party modules?
Telnet to the server's port 80, and type HEAD / HTTP/1.0 . Changes are this will return at least the type of server it is running and its version:
XSSI = extended SSI
Note: SHTML is often short for SSI HTML pages
AddType Application/x-httpd-php-source .phps
(.phps shows the source)
Aliasing = mapping a client's URL into a non-standard location and automatically retrieving the resource from this location eg. alias /icons "/usr/local/apache/icons"
Redirection: mod_alias (for aliasing) and mod_rewrite (for redirection).
mod_status -> www.mysrv.com/server-status
To track users: mod_usertrack + mod_session
mod_cookies -> mod_usertrack (is a standard module)
By default, Apache looks for a file called .htaccess. Here's a sample:
You can improve performance by telling Apache to not look for .htaccess files outside the DocumentRoot directory:
You can tell Apache which directives can be overriden in an .htacess file using the AllowOverride directive, eg. "AllowOverride FileInfo Indexes". The order in which the directives allow, deny, and Satisfy are checked can be overriden in .htaccess if Limit Override is enabled (default). For higher security, use "AllowOverride -Limit".
If a user aims at a directory for which no default document is available, as set by the DirectoryIndex directive, you can tell Apache to list the content of this directory by using the mod_autoindex module. For security reasons, it is recommended to disable this feature:
Note: ScanHTMLTitles tells Apache to open any file ending in .HTML or .HTM, read its Title section (if any), and display its content in the Description column. It's CPU and harddisk-intensive, so you might want to only use this on small Intranet servers.
Authentication can be host-based, user-based (those two using the "Satisfy any" option), or both (Satisfy all, which is the default option):
For information, the SSL package is available through either Apache_SSL or mod_ssl (better documented, and easier to install.) Two open-source packages implement SSL: SSLeay (discontinued), and OpenSSL.
Note that you should use IP addresses instead of host or domain names to improve performance and lower the consequences of losing access to the DNS.
Multiple authentication modules can be specified in httpd.conf. Watch out for the fact that the task of authenticating user requests is based on the reverse of the order in which AddModule directives are specified, ie. the last authentication module has the highest priority. If a module is said to be authoritative, authentication will be not be passed to lower-priority modules (to disable authoritativeness,
This module lets you allow or deny access based on information returned by the client browser. It provides two directives: BrowserMatch and SetEnvIf. Some examples:
htpasswd -c /usr/local/apache/auth/userfile jdoe. Omit "-c" when adding users to an existing text file.) By default, hashing is done through crypt(), but the -m, -s, and -p switches can be used to tell htpasswd to use MD5, SHA, and plain text, respectively.
Access is controled through a .htaccess text file saved in the directory to be protected. For example,
... where groupfile is:
AuthName "mod_auth Test Realm"
require user jdoe
require group admins
admins: jdoe janedoe
To create a user database for MD5 authentication, run eg.
htdigest -c /usr/local/apache/auth/password.MD5 "Just testing" jdoe.
Note: AuthDigestDomain contains URLs; Digest authentication is always authoritative.
AuthName "Digest Authentication"
AuthDigestDomain /MD5Protected/ /private/ Require group WebAdmins
To activate the use of mod_auth_db, compile it either as a static or dynamic module, and add the following directives to httpd.conf:
Build an .htaccess file in the directory you want to protect:
LoadModule db_auth_module libexec/mod_auth_db.so //Only needed for dynamic modules (DSO) AddModule mod_auth_db.c
A Berkeley DB file is created through Perl script /usr/bin/dbmmanage (where is it ? # rpm -ql db3-utils /usr/bin/berkeley_db_svc /usr/bin/db_archive /usr/bin/db_checkpoint /usr/bin/db_deadlock /usr/bin/db_dump /usr/bin/db_dump185 /usr/bin/db_load /usr/bin/db_printlog /usr/bin/db_recover /usr/bin/db_stat /usr/bin/db_upgrade /usr/bin/db_verify
AuthName "DB Authentication Realm" AuthType basic AuthDBUserFile /usr/local/apache/auth/dbpasswds AuthDBGroupFile /usr/local/apache/auth/groups.dbm require group WebAdmins AuthDBAuthoritative On
The session key of every request is recorded by mod_session as an internal apache data structure called a note, which is referenced like a system environment variable.
Log analyzer: Analog/Getstats, WebAlizer
Just like with any program, you must set environment variables such as PATH, and check all user input to avoid buffer overflows and other major security breaches.
To improve security, you can use CGI wrappers to run scripts under a different UID: suEXEC and CgiWrap. suEXEC must be included when compiling the Apache binary, and is activated by adding User and Group directives in a section that are different from those used in the main section of httpd.conf. CgiWrap is more flexible as it runs scripts using the UID/GID of the owner of the script file. Once built, CgiWrap must be enabled by adding handlers in httpd.conf:
Two ways to add SSL to Apache: Apache-SSL, and mod_ssl
OpenSSL is an open-source alternative to Netscape's SSL, and is derived from SSLeay. It consists in two libraries: libcrypto.a and libssl.a
To test OpenSSL, run /usr/local/ssl/bin/openssl version
The private key is server.key (it must be backed up and chmod 0400 server.key). The certificate signing request is server.crt, and the x509 certificate is server.crt.
/usr/local/apache/conf/ssl.csr/ = certificate signing request files
/usr/local/apache/conf/ssl.crt/ = x509 certificates
/usr/local/apache/conf/ssl.key/ = private keys
Note: The private key is also left in the Apache source directory!
To start Apache in SSL mode, run either httpd -DSSL, or apachectl startssl.
The x509 cerificate is stored in PEM (private enhanced mail) -> to view, openssl x509 -in server.crt -noout -text, or openssl x509 -noout -text -in server.crt
To have the server's public key certified, send server.csr
To self-certify: openssl req -x509 -key ./ssl.key/server.key -in ./ssl.csr/server.csr -out ./ssl.crt/server.crt, and copy into the SSL certificate file in eg. /usr/local/apache/conf/ssl.crt/server.crt
Client to server authentication is also possible by copying into /usr/local/apache/conf/ssl.crt/, and running make update
All certificates can be concatenated into ca-bundle.crt.
Certificate revocation list (crl)
Commercial SSL server : StrongHold, RH SecureWeb Server (SWS), Covalent Technology's Raven
To upload files: POST (sent as stream -> parsing in CGI script), and PUT (mod_put)
HTTP 0.9 HTTP 1.0: content type sent to browser HTTP 1.1: hostname identification -> which virtual host should answer request; content negotiation to match capabilities; uploading files GET = msg body HEAD = header info Very small and fast httpd's: THTTPd (acme.com), MathOPd, BOA. iPlanet = ex-Netscape Enterprise Serve
Apache spawns processes instead of threads. APACI = GNU's autoconf --enable-module=all/most --activate-module //to compile 3rd party modules statically into Apache from source placed into the src location --add-module //To copy module source file into directory before compiling & linking it statically into Apache --enable-shared=max -> all modules as DSO's ; must be last directiv
./config-status //saved configure output config.layout ./configure --with-layout=RedHat strip ... to remove debug infos from a binary httpd -t to test config httpd -D to define (eg. httpd -D SSL ->APXS is a Perl script to compile and install 3rd party DSO modules. Unlike APACI, it can handle modules consisting in more than one source file.
) In httpd.conf, core directives (global) and those that are context-dependant. Some instructions can only be used in a given context. global env't : directives for Apache server process as a whole default srv section: , , virtual hosts .htaccess .htacess allow from 192.168.0.* An .htaccess file can be located anywhere in the file system, not just in the DocumentRoot directory. Use AllowOverride to specify which directives can be overridden in an .htaccess file. More directories allowed through mod_userdir; directive userdir; default dir is ~/public_html/, but can be changed through userdir directive eg. userdir /home/*/www userdir disabled root webmaster CGI scripts can be run in a CGI wrapper to change UID/GID before being run: SuEXEC or virtual hosts Redirection to allow access to docs in directories outside DocumentRoot: 1. chown -R nobody:nobody /usr/doc/MySQL 2. ln -s manual_toc.html index.html 3. Alias /MySQL/ "/usr/doc/MySQL/" mod_dir: to add trailing "/", and return index.html if not specified through DirectoryIndex Fancy directory listing through mod_autoindex + IndexOptions FancyIndexing Modules : core_module and mod_so (always static); standard Apache modules; 3rd-party modules Modules register callback functions called by Apache mod_perl allows for development of modules in Perl instead of C. Modules can be built in two ways: in Apache's source directory (use APACI to add), or as DSO modules (better to keep outside the Apache source directory and use perl script apxs to build .so). APACI = Apache AutoConf-style Interface. Apxs can compile and install most 3rd-party modules, which often have more than one source file. Some modules like OpenSSL make extensive changes to the Apache source directory, so cannot be installed through apxs. Apxs -c -i -a MyModule.so http://modules.apache.org With the expection of core_module and mod_so, all Apache standard modules can be built as DSO modules. LoadModule //To load DSO module (not used by static modules) AddModule //To enable module (either static or DSO) A DSO module can be known by Apache under a name different from its actual filename: LoadModule firewall_module libexec/mod_firewall.so AddModule mod_firewall.c Modules loaded last are processed first!
Some modules like OpenSSL make extensive changes to Apache, so cannot be installed by APXS. The standard use of APXS is:
apxs -c -i -a mymodule.so