Web applications for dummies

Introduction

This document deals with common issues when moving from dedicated applications to web applications

General Tips

To check for syntax errors in a PHP script: php -l myscript.php

Performance Tips

Things to watch out for when writing web applications

Here's a list of things to pay attention when moving from the world of dedicated applications to web applications:

To read

Investigating slow performance

  1. Bandwidth: ping www.acme.come and see how fast the server replies
  2. Server: vmstat, iostat
  3. Web server: Upload a basic, "Hello, world!" HTML page
  4. PHP: Write a simple PHP script with start/end time to print "Hello, world!"
  5. MySQL: Connect to server with mysql client, and profile database (eg. explain, show status, etc.) to profile queries
  6. Needed? PHP + MySQL: Run the guilty queries identified previously

 

  1. uptime to check load average: If > 5, add more CPU
  2. ps aux: Check CPU time + RAM/virtual memory used by programs, and their state (running, runable, sleeping)
  3. pstree
  4. top: if lots of sleeping processes but CPU somewhat idle, probably due to I/O blocks

General/hardware/OS

/proc/stat

(From /proc/stat explained):

(From Red Hat Enterprise Linux 5.1 Deployment Guide)

/proc/interrupts records the number of interrupts per IRQ on the x86 architecture

/proc/ide/ contains information about IDE devices on the system. Each IDE channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1. Many chipsets also provide a file in this directory with additional data concerning the drives connected through the channels (eg. is DMA enabled, etc.) Within each IDE channel directory is a device directory. The name of the device directory corresponds to the drive letter in the /dev/ directory. For instance, the first IDE drive on ide0 would be hda. Each device directory contains a collection of information and statistics.

top/uptime

Load averages: The same information can be had by running "uptime". The average sum of the number of processes waiting in the run-queue plus the number currently executing over 1, 5, and 15 minute time periods. . If the numbers of running processes is significantly and steadily higher than the number of CPU (eg. if you have one CPU but a load average of 20 running processes), you're in trouble. OTOH, if most of the processes are waiting to run, it might mean that they're waiting for the DBMS to reply, the hard disk to complete a task, or the network to send/receive data. Ideally, the load average should be consistently inferior to the number of CPU's you have.

Too much idle means nothing is being done; too much system time indicates a need for faster I/O or additional devices to spread the load. Watch these numbers over time to determine what's normal for that system,and watch for changes.

Watch for interrupts coming from peripherals to get a rough idea of how much load the associated device is handling.

vmstat -n 5

shows IO, swap, memory and processor resources

hdparm -Tt /dev/sda

To check hard disks

iostat -dx 5 5

mpstat -P ALL 5 5

Information on disk I/O

free

"Free reports on memory, both real and swap. You get a snapshot of the amount of real memory split across programs sharing the same memory space (shared), buffers used by the kernel (buffers) and what has been cached to disk. The "-/+" line reflects the total vs. used memory as reflected by the combination of the disk buffer cache and memory actually written to disk."

ps

netstat

netstat -nap --tcp (or netstat -napc --tcp for a continuous refresh)

time <application>

Resources

Apache

ps and top : to check number of processes, and how much they work

logs : how many active connections it's maintaining

HTTPD-Test's Flood

PHP

Just put a few calls to Microtime() to check how much time is spent in the different parts of a PHP page:

$starttimer = time()+microtime();
 
$db = mysql_connect("localhost", "root","test");
mysql_select_db("mysql",$db);
$result = mysql_query("SELECT * FROM user",$db);
printf("Host: %s<br>\n", mysql_result($result,0,"Host"));
printf("User: %s<br>\n", mysql_result($result,0,"User"));
mysql_free_result($result);
mysql_close($db);
    
$stoptimer = time()+microtime();
$timer = round($stoptimer-$starttimer,4);
echo "Page created in $timer seconds.";

Benchmark, xdebug, DBG, Advanced PHP Debugger

Profiling PHP

MySQL

  1. Find the queries that impact the server most (general log, slow queries log)
  2. Check their execution plans with EXPLAIN
  3. Tune if necessary (SHOW STATUS/mysqladmin, SHOW PROCESSLIST, innotop, FLUSH STATUS/SHOW SESSION STATUS, etc.)

 

Communication/latency? Database design? Write access?  Read Access? Caching? Bad query design?  What does EXPLAIN tell you?  Do your databases have enough memory in the different caches?

Resources

Improving performance

ToCheck

Lightttpd is fine.  It is recommended that you run both Apache and Lighttpd on port 80, so you need to get separate IP addresses for each server. You may want to read this article for more on this, specifically the point 5.

http://www.websiteoptimization.com/speed/tweak/parallel/

Provided the bottleneck is definitely the server, and it's due to process overhead of spawning forked apache copies. A lightweight server that has less in it may actually spawn better (try Zeus).

A first list of things to try

  1. Investigate: iostat, vmstat, top, ps, netstat, xdebug, MySQL Explain, etc.
  2. Profile PHP + MySQL; Check that MySQL doesn't have run-away processes
  3. Check system/application logs for possible hints
  4. Disk controller (either SCSI or RAID) should favor read over write, as web servers perform more outputs to clients than inputs; Enterprise-grade disks
  5. Check if bigger swap partition
  6. Sofware RAID instead of HW RAID (why faster?)
  7. RAID1 (why not RAID0?)
  8. Optimize filesystem (disable atime, etc. "The relatime option just updates times if access time is newer, it is similar to noatime and does not break some applications which need know last read time of a particular file")
  9. Get dual/quad CPU board (eg. Intel Harpertowns) + lotta RAM to avoid swapping/trashing
  10. Uninstall any unneeded program (CUPS, etc.)
  11. Good NIC, set up right (no autosensing); Organize switch/LAN to avoid collisions; Each www server connected to shared DB server through cross-over Ethernet?
  12. Check DNS/mail settings to avoid resolve and other configuration problems (relaying, etc.)
  13. If sending lots of mail, consider a dedicated server
  14. Check total bandwidth at colo (switch, router, etc.)
  15. In PHP, don't include unnecessarily heavy add-on's
  16. Resize uploaded pictures when writing to disk, so won't need to do this with every hit
  17. Consider load balancing only after checking the above (HAProxy, UltraMonkey, Linux LVS, etc.)

General/hardware/OS

Apache

MySQL

PHP

Resources

Notes from "Apache Cookbook"

Notes from "Apache Server Bible"

Notes from "Essential System Administration"

The most important system resources from a performance perspective are CPU, memory, and disk and network I/O, although sometimes other device I/O can also be relevant.

Put simply, when you don't have enough of something, there are only a few options: get more, use less, eliminate inefficiency and waste to make the most of what you have, or ration what you have.

Tuning tools in Linux: files under /proc/sys

uptime, top: The load average is a rough measure of CPU use. These three figures report the average number of processes active during the last minute, the last five minutes, and the last 15 minutes. High load averages usually mean that the system is being used heavily and the response time is correspondingly slow.

ps aux: produces a report summarizing execution statistics for current processes.

pstree: displays system processes in a tree-like structure, and it is accordingly useful for illuminating the relationships between processes and for a quick, pictorial snapshot of what is running on the system.

vmstat interval count: information processes, CPU, memory, disk

iostat interval: show current disk usage as the number of transfers/sec (tps) and MB/sec.

Notes from "Building Scalable Web Sites" By Cal Henderson

Application =

Tips to build scalable web apps:

  1. Separate logic code from markup code
  2. Split markup code into one file per page
  3. Switch to a templating system, and split static/dynamic parts that make up a page; Use CSS
  4. Separate page logic from business logic
  5. Keep hardware in mind, as it influences the whole architecture
  6. "Premature optimization is the root of all evil"... but make sure you don't make obvious, major architecture errors

Free bugtrackers:

When dealing with the filtering of incoming data, the best approach is to group it into three categories: good, valid, and invalid. Good data is the kind of data you expect and want. Valid data is the kind of data that your application can process, store, and manipulate, but that might not make contextual sense. Invalid data, finally, is data that breaks some element of your application's storage, processing, or output:

CPU

CPU processing speed, contrary to popular belief, is almost never a bottleneck in web applications. To get an overview of what's eating your CPU time, you can run top on Unix machines. When the load average exceeds the number of processors in a box, there are processes in the queue waiting to run. If you find that you're spending CPU time inside your own application components, then you'll need to drill down to the next level and figure out exactly where within your code the time is being spent.

For PHP programmers, the open source Xdebug suite of tools (http://xdebug.org/) includes a powerful code profiler. After installing the Xdebug extension, you can call the xdebug_start_profiling( ) function to start profiling the running script. Once you reach the end of your script, you can call xdebug_dump_function_trace( ) to dump out a table of profiling data in HTML (assuming you're running from within mod_php). By adding a couple of lines to our php.ini or .htaccess files, we can enable transparent profiling. Every time a script on the server is executed, the code is profiled and the results are saved to disk.

PHP doesn't have any built-in method for saving the compiled version of a script, so we need extra tools to accomplish this. These tools are called opcode caches because they compile the source code down to opcodes (virtual code that the PHP runtime engine can execute) and cache the opcode version of the script.

I/O

The portion of time that we spend actually within the dynamic web serving layer can be bound by CPU, I/O, and context switching. In general, the bottleneck in systems using a data store layer is I/O of some kind (network, disk, memory, etc.), although given enough I/O bandwidth, you'll start to see CPU-bound processes or memory limits, depending on the size of your working set. The serving of static content is very rarely CPU bound, but is often I/O bound (and sometimes incurs context-switching overhead). Once we run out of memory and start to swap, our previously fast memory I/O and CPU-bound operations become disk I/O bound. Although running out of memory manifests itself as a disk I/O problem (assuming you have swap enabled), it can be useful to treat it as a separate problem. Deciding carefully what data to keep in memory can drastically reduce your memory usage and swapping.

External Services

Solving the capacity question for external services depends very much on the nature of the service. Some services can be easily scaled out horizontally where multiple nodes are completely independent. Some services are designed for horizontal scaling, with infrastructure for node communication built in. Some services just won't scale. Unfortunately, if the latter is the case, then there's just not a lot you can do.

MySQL has an option to create a log of all the queries it runs. If we turn on this log for a while, then we can get a snapshot of the queries produced by real production traffic.

MySQL

Scaling in a Nutshell

Using templates

Why use an application server vs. PHP?

  1. Even when using a mod_ where the interpreter is compiled into the web server and is launched at boot time, a page-based application like PHP is usually slower than an application server, since the latter doesn't need to rerun/reparse the code with each new connection
  2. Retrieved data can cached more efficiently
  3. An application server like CherryPy doesn't tie you to a single templating style: CP starts with no template and lets you add one, PHP starts with a template and lets you replace it
  4. An application server scales better: PHP is much easier for smaller apps, and so many web apps are developed incrementally
  5. Why should I use eg. CherryPy instead of mod_python? Better API

More information: http://www.google.com/search?q=servlets+versus+cgi, including:

Resources