SQL Injection: Detailed concept with examples

Some days ago one of my junior who is a newbie programmer asked me for an interesting thing when I went their dorm. He said that he want to learn SQL Injection. Unfortunately or fortunately SQL injection is not for learning but we need to learn how to prevent SQL Injection as it is a PHP security topic.

SQL Injection” is subset of the “an unverified/unsanitized” user input vulnerability, and the idea is to convince the application to run SQL code that was not intended. If the application is creating SQL strings naively on the fly and then running them, it’s straightforward to create some real surprises.

There have been other papers on SQL injection, including some that are much more detailed, but this one shows the rationale of discovery as much as the process of exploitation.

Case Study:

The login page had a traditional username-and-password form, but also an email-me-mypassword link; the latter proved to be the downfall of the whole system.

When entering an email address, the system presumably looked in the user database for that email address, and mailed something to that address. Since my email address is not found, it wasn’t going to send me anything.

So the first test in any SQL-ish form is to enter a single quote as part of the data: the intention is to see if they construct an SQL string literally without sanitizing. When submitting the form with a quote in the email address, we get a 500 error (server failure), and this suggests that the “broken” input is actually being parsed literally. Bingo.

We speculate that the underlying SQL code looks something like this:

SELECT fieldlist

FROM table

WHERE field = ‘ $EMAIL ‘;

Here, $EMAIL is the address submitted on the form by the user, and the larger query provides the quotation marks that set it off as a literal string. We don’t know the specific names of the fields or table involved, but we do know their nature, and we’ll make some good guesses later.

When we enter bob@example.com’ – note the closing quote mark – this yields constructed SQL:

SELECT fieldlist

FROM table

WHERE field = ‘ bob@example.com’ ‘;

When this is executed, the SQL parser find the extra quote mark and aborts with a syntax error. How this manifests itself to the user depends on the application’s internal error-recovery procedures, but it’s usually different from “email address is unknown”. This error response is a dead giveaway that user input is not being sanitized properly and that the application is ripe for exploitation.

Since the data we’re filling in appears to be in the WHERE clause, let’s change the nature of that clause in an SQL legal way and see what happens. By entering anything’ OR ‘x’=’x , the resulting SQL is:

SELECT fieldlist

FROM table

WHERE field = ‘ anything’ OR ‘x’=’x ‘;

Because the application is not really thinking about the query – merely constructing a string – our use of quotes has turned a single-component WHERE clause into a two-component one, and the ‘x’=’x’ clause is guaranteed to be true no matter what the first clause is (there is a better approach for this “always true” part that we’ll touch on later).

But unlike the “real” query, which should return only a single item each time, this version will essentially return every item in the members database. The only way to find out what the application will do in this circumstance is to try it. Doing so, we were greeted with:


Your login information has been mailed to random.person@example.com.


Our best guess is that it’s the first record returned by the query, effectively an entry taken at random. This person really did get this forgotten-password link via email, which will probably come as surprise to him and may raise warning flags somewhere.

We now know that we’re able to manipulate the query to our own ends, though we still don’t know much about the parts of it we cannot see. But we have observed three different responses to our various inputs:

  • “Your login information has been mailed to email”
  • “We don’t recognize your email address”
  • Server error

The first two are responses to well-formed SQL, while the latter is for bad SQL: this distinction will be very useful when trying to guess the structure of the query.

Brute-force password guessing

One can certainly attempt brute-force guessing of passwords at the main login page, but many systems make an effort to detect or even prevent this. There could be logfiles, account lockouts, or other devices that would substantially impede our efforts, but because of the nonsanitized inputs, we have another avenue that is much less likely to be so protected.

We’ll instead do actual password testing in our snippet by including the email name and password directly. In our example, we’ll use our victim, bob@example.com and try multiple passwords.

SELECT email, passwd, login_id, full_name

FROM members

WHERE email = ‘ bob@example.com’ AND passwd = ‘hello123 ‘;

This is clearly well-formed SQL, so we don’t expect to see any server errors, and we’ll know we found the password when we receive the “your password has been mailed to you” message. Our mark has now been tipped off, but we do have his password.

This procedure can be automated with scripting in perl, and though we were in the process of creating this script, we ended up going down another road before actually trying it.

The database isn’t readonly

So far, we have done nothing but query the database, and even though a SELECT is readonly, that doesn’t mean that SQL is. SQL uses the semicolon for statement termination, and if the input is not sanitized properly, there may be nothing that prevents us from stringing our own unrelated command at the end of the query.

The most drastic example is:

SELECT email, passwd, login_id, full_name

FROM members

WHERE email = ‘ x’; DROP TABLE members; — ‘; — Boom! 😦

The first part provides a dummy email address — ‘x’ — and we don’t care what this query returns: we’re just getting it out of the way so we can introduce an unrelated SQL command. This one attempts to drop (delete) the entire members table, which really doesn’t seem too sporting.

This shows that not only can we run separate SQL commands, but we can also modify the database. This is promising.

Mail me a password

We then realized that though we are not able to add a new record to the members database, we can modify an existing one, and this proved to be the approach that gained us entry.

From a previous step, we knew that bob@example.com had an account on the system, and we used our SQL injection to update his database record with our email address:

SELECT email, passwd, login_id, full_name

FROM members

WHERE email = ‘ x’;

UPDATE members

SET email = ‘bob@example.com’

WHERE email = ‘bob@example.com ‘;

After running this, we of course received the “we didn’t know your email address”, but this was expected due to the dummy email address provided. The UPDATE wouldn’t have registered with the application, so it executed quietly.

We then used the regular “I lost my password” link – with the updated email address – and a minute later received this email:


From: system@example.com

To: bob@example.com

Subject: Intranet login

This email is in response to your request for your Intranet log in information.

Your User ID is: bob

Your password is: hello


Now it was now just a matter of following the standard login process to access the system.

Mitigations (How to Prevent? 🙂 )

We believe that web application developers often simply do not think about “surprise inputs”, but security people do (including the bad guys), so there are three broad approaches that can be applied here.

Sanitize the input

It’s absolutely vital to sanitize user inputs to insure that they do not contain dangerous codes, whether to the SQL server or to HTML itself. One’s first idea is to strip out “bad stuff”, such as quotes or semicolons or escapes, but this is a misguided attempt. Though it’s easy to point out some dangerous characters, it’s harder to point to all of them. The language of the web is full of special characters and strange markup (including alternate ways of representing the same characters), and efforts to authoritatively identify all “bad stuff” are unlikely to be successful. Instead, rather than “remove known bad data”, it’s better to “remove everything but known good data”: this distinction is crucial. Since – in our example – an email address can contain only these characters:





It’s important to note here that email addresses in particular are troublesome to validate programmatically, because everybody seems to have his own idea about what makes one “valid”, and it’s a shame to exclude a good email address because it contains a character you didn’t think about.

Be aware that “sanitizing the input” doesn’t mean merely “remove the quotes”, because even “regular” characters can be troublesome. In an example where an integer ID value is being compared against the user input (say, a numeric PIN):

SELECT fieldlist

FROM table

WHERE id = 23 OR 1=1 ; — Boom! Always matches!

Escape/Quotesafe the input

Even if one might be able to sanitize a phone number or email address, one cannot take this approach with a “name” field lest one wishes to exclude the likes of Bill O’Reilly from one’s application: a quote is simply a valid character for this field.

One includes an actual single quote in an SQL string by putting two of them together, so this suggests the obvious – but wrong! – technique of preprocessing every string to replicate the single quotes:

SELECT fieldlist

FROM customers

WHERE name = ‘ Bill O”Reilly ‘; — works OK

However, this naive approach can be beaten because most databases support other string escape mechanisms. MySQL, for instance, also permits ‘ to escape a quote, so after input of ‘; DROP TABLE users; — is “protected” by doubling the quotes, we get:

SELECT fieldlist

FROM customers

WHERE name = ‘ ”; DROP TABLE users; — ‘; — Boom!

The expression ”’ is a complete string (containing just one single quote), and the usual SQL trouble follow. It doesn’t stop with backslashes either: there is Unicode, other encodings, and parsing oddities all hiding in the weeds to trip up the application designer.

Getting quotes right is notoriously difficult, which is why many database interface languages provide a function that does it for you. When the same internal code is used for “string quoting” and “string parsing”, it’s much more likely that the process will be done properly and safely.

Some examples are the MySQL function mysql_real_escape_string() and perl DBD method $dbh->quote($value).

These methods must be used.

All I want to say is “ESCAPE ESCAPE ESCAPE” your values before putting into your query.

Cheers! 😀