Monday, February 14, 2011

A bit of benchmarking - PHP vs. Java vs. Python

Anyone who has ever taken a stroll down the interblag or spent time searching through internet forums knows that there is no shortage of religious wars being fought. Mac vs PC, Firefox vs IE (obvious winner here), Emacs vs vim, open source vs proprietary, I could go on and on.

One of the recent religious wars I have been a part of (more of an innocent bystander) is the PHP vs Java vs Python vs whatever for web development war. I hear people all over the internet saying that PHP is not scalable and that Java is so much faster for web development and Python is the best and ruby on rails will cure cancer and blah blah blah blah. As a PHP developer I naturally wondered if there was a benefit in switching to another language. While it is obvious that all the popular programming languages are scalable as a number of very large websites have been written in essentially all popular languages (queue angry comments from language fanatics), it is not entirely obvious which is the faster language.

Now almost any benchmark will show that Java has a faster execution time than PHP and Python, which is reasonable and expected. But so far I have not seen any benchmarks for PHP vs. Java vs. Python in a web environment. Also, every benchmark I have seen in this realm is using either an absurdly trivial program (such as hello world), or a program that favors one language over those it is compared to and makes special use of language features not present in every language.

Because of these issues, the bencmarks are completely useless. The process of getting a web page is almost unrelated to running a program from the command line. I want some benchmarks that are relevent to web programming!

Recently for a class I am taking in school I was required to write the same website twice using two languages: PHP and Java. And then for kicks I wrote it a third time using python. The website is a simple survey that will allow users to vote, and then display the results of the survey so far.

I decided that this would be a good opportunity to do some benchmarks, and see what the difference really is.

The Setup

The underlying code is fairly simple and representative of your average web page. All three programs follow a few simple steps in almost the same manner:
  1. Read in a file to open the page (html code)
  2. If the user hasn't voted display the survey (another file)
  3. If the user has voted display the survey results (query the database and process data)
  4. And finally read another file to close the page (more html)
These are operations that are typical of nearly every dynamic web page, so they shouldn't be biased to one languages features over another.

The data set from the database is very small (less than 150 records), and running them from the MySQL command line shows an execution time of 0.00 seconds. So the performance of MySQL won't sway the benchmarks of the languages. But I feel that it is important that we do make a database connection and process results, because that is one of the biggest tasks of web programming. The time that it takes to get the data from the database to the script is a crucial part of execution time. Typically webpages will only retrieve a small amount of data, even if the underlying dataset is huge, so this setup should be perfect.

I recognize that the architecture of the program is not the smartest way of doing it (and honestly is somewhat flawed) but it won't effect performance, so is not relevant for this test. I got an A on this assignment, so I don't care enough to go back and fix it.

The tests will be run on my basement server. Please don't laugh at the specs; I'm poor. Feel free to donate to the "buy Adam a better server fund" if you think this computer is inadequate for testing.

It is running a Pentium III at 930 Mhz with 512 Meg of RAM.
The operating system is Linux Mint (a Ubuntu variation).
All pages are being served through apache 2.2

Yes, my server is a horrible waste of space and should be sent to Estonia to be used as a paper weight by cave trolls, but it's what I have right now, deal with it. If someone has a real server that supports PHP and Java servlets and python, and has way too much time on their hands, please run these programs and benchmark them.

The setup for the languages goes as follows:

  • PHP
    • Connection: Apache with mod_php
    • PHP version: 5.2.6 with eAccelerator
    • MySQL Connector: internal driver
  • Java
    • Connection: tomcat6 behind apache with mod_jk
    • Version - Java: 1.6 OpenJDK
    • Version - Tomcat: 6.0.18
    • MySQL Connector: jdbc
  • Python
    • Connection: Apache with mod_python
    • Version: 2.6.2
    • MySQL Connector: MySQLdb

Our metric will be page requests per second. To test we will be timing a curl script that will request  the page 1000 times. We will run the test for the page that displays the survey, and again for the page that displays the survey results. I anticipate the survey results page will be significantly slower as it requires a database connection and processes data.

A Few Quick Notes

Before giving the results I should make note that these may be slightly biased towards PHP. I am not a great Java programmer and only recently have learned python. Before I got some help from the folks at stackoverflow the Java version was an order of magnitude slower than PHP and the Python was next to worthless. As far as I know I have worked out the serious performance issues with all of these, but I am posting the source so that someone with a bit more experience writing servlets or python can scrutinize my code and point out how to get some more speed from it.

Please do not spam me with a billion comments about how I can save 10 nanoseconds by standing on my head while eating a tangerine and chanting a magical java phrase. I also don't care about the design or elegance of the code, a lot of this was hacked together only to make it work. Elegance and design are topics for another day. I am mostly interested in significant performance issues, this is a benchmark test.

I am running the test from the same computer that is running apache, which has two main consequences:

  • The server resources will be used for the test.

    This means that we will not get as much performance as we could. I feel like this isn't an issue because all of the languages are put on even ground.

  • Network latency will not affect the test results.

    This is the real reason for testing from the server. The requests will go through the loopback interface, so it is still a network request, but doesn't go through the tubes. My server is sitting on a 1.5 Mb connection, which dies at about 20 requests per second, so it wouldn't work from the outside anyway.

It should be noted that these are hits to the server, not real page requests. No css, javascript, or images are being loaded. If we were load testing then these would be important, but we are testing the languages, not the server.

The java source code can be found here. My sweat and blood went into this. Seriously.

The python source can be found here. I didn't implement the code that sets the session or inserts to the database. It's not part of the benchmark so I didn't bother with it.

I don't think I need much assistance with the php code, but I posted it anyway. You can find it here.

The Results

Without further ado, here are the results:

Total Time (1000 requests)

Static Content Database Access
Python 3.78 seconds 6.25 seconds
Java 5.22 seconds 6.63 seconds
PHP 1.22 seconds 1.28 seconds

Requests Per Second

Static Content Database Access
Python 264 requests/sec 160 requests/sec
Java 191 requests/sec 150 requests/sec
PHP 819 requests/sec 718 requests/sec

Conclusions

I hear a lot of bad things about PHP, but according to these tests it blows java and python out of the water. PHP was 2 times faster than python and 3 times faster than java for static content, and about 5 times faster than either of them with database access.

It appears that because PHP was designed for a web environment it works much better in a web environment. Python and java can easily beat PHP for raw execution time, but they were not designed for the web, and as a result they have serious flaws in that environment.

There was a much smaller difference in the difference between PHP serving static content and accessing the database (13% slower), while python and java slowed down by 40% and 22%, respectively. Again, PHP is doing what it was born to do, the others are putting on a hat that doesn't fit them as well.

As I mentioned before I am much better at coding PHP, but I don't want to hear any whining about this fact until someone can point out problems with the python or java versions. I will be happy to rerun the tests with any suggested changes. So no whining!

While the main purpose of this has been to test the speeds of these languages in a web environment, but in the process I have learned a lot about the difficulty of writing in the three languages. Obviously the PHP version was very easy for me to write, but I didn't find the same for the other languages. For both of the others I more or less just copied the PHP over and converted it to the new language, which should be pretty straight forward.....

Writing the python code was surprisingly easy, but I could find little to no documentation for what I was doing. I spent almost as much time searching the net for how to connect to mysql as I spent coding the entire application. My first go at it was horribly slow, about 30 requests per second, and it took me a long time to figure out that request.write() is a very expensive method, which would have been nice to read in a manual, instead of benchmarking my code for over an hour. Overall I felt it was very easy to write the code, but I think I will stray away from mod_python due to the horrible lack of documentation. Perhaps I will check out django or another approach and see if it is a bit more friendly.

Writing the java version was an adventure, and resulted in me commiting suicide. Twice. There is bit more documentation for servelts than for mod_python, but it pales in comparison to PHP. Also the community appears to be smaller, with not as many code examples available. The bigger issue that I *hopefully* won't need to go through again was getting everything setup. I ended up installing two different versions of tomcat about twelve times each before I finally got it to even turn on and give me a welcome page, and then it took hours more to get mod_jk to finally work so I could get tomcat working on port 80, and then several hours more to get a hello world servlet going, and then hours more to find enough documentation to make a database request. I admit that I don't like java. I feel like they are holding me at gunpoint and forcing me to use "good design". I especially hate the exception handling, there's nothing like seeing a stack trace while you are looking at pictures of your cat (I wish I could have a cat in my apartment). But that's a topic for another day, I will try and stay off my soap box for now. A more legitimate complaint is that I get tired of compiling and waiting for tomcat to reload my servlet every time I realized that I forgot a <br /> tag or need to make a trivial change. When developing an application you make thousands of changes, and waiting 30 seconds between every single change gets very frustrating. I had considered switching to java for the supposed performance increase, but it appears to not exist, so forget that idea.

Obviously these tests don't suggest that Java or python are not fit for web development. Both languages have been proven to be good solutions in many cases. For me personally I think I will stick with PHP. It is very fast, and I find it to be very easy to develop with. Even more than the performance I appreciate a huge community and exceptional documentation. I think it is the best option for me.

9 comments:

  1. Hi, mod_python is generally marked as "no go" because it's slow, not documented and not anymore maintained (need to verify).
    WSGI is the current "best" approach to web development in Python.

    Hope it helps!

    ReplyDelete
  2. webware for python is much better its support servlets like java and its fast very fast

    ReplyDelete
  3. Thanks for this =) I was looking for stats like these

    ReplyDelete
  4. OpenJDK is absolutely terrible; basically unusable for anything.

    Give java another shot using oracle java 7, and throw in the --server option.

    With --server, the compiler goes into performance mode, so the first few runs are slower, but the JIT recompiler will make it blazing fast afterwards.

    It's generally fair to give it a couple iterations to warm up, then blast through at least a 1000 and see what you get. :)

    ReplyDelete
  5. Also, I'd hate to throw more on your plate, but your java app could run about 6x faster if you used multithreading (not that hard if you use an apache Executor service). Basically, you are blocking on every single input and output, and the true speed of java is unlocked when you multithread. For instance, when reading from db and printing out, you can get a warm thread from the executor, push each result onto a ConcurrentLinkedQueue, and let the thread stream them out while you continue to read.

    This is why your db reads were slow; the db read itself could be super-fast, but you are printing to the http output stream between each result, making it appear much slower than it actually could be.

    PHP is considered the slowest web application language because it has no built in performance for multithreading. Every enterprise server I've ever worked on has been java, and every request does at least three things in parallel to make it run, well, at least 3x faster.

    ReplyDelete
    Replies
    1. AnonymousJune 16, 2013

      Yeah, the weakness of PHP is unsupporting multi-threading. Hope it fixed in the future. I recently join a project that use PHP and want to use a java thread executor to queue the request to that PHP module. Wish if it was writtern in Java :-(

      Delete
  6. Hi,

    Recently I came across some great articles on your site.
    The other day, I was discussing (http://blog.sumofchoices.com/2011/02/bit-of-benchmarking-php-vs-java-vs.html)with my colleagues and they suggested I submit an article of my own. Your site is just perfect for what I have written!
    Would it be ok to submit the article? It is free of charge, of course!

    Let me know what you think
    Contact me at anelieivanova@gmail.com

    Regards
    Anelie Ivanova

    ReplyDelete
  7. Hi, I was a php'er and still am but now I use java for critical high performance web dev. Php is really fast and fast to develop but after benchmarking; a servlet is much faster and able to handle more request/sec if you roughly know what you're doing but don't tack on huge frameworks, spring, hibernate and crapware like that. I know both languages pretty well and written super-fast and light web frameworks for both.

    ReplyDelete
  8. Java is popular for web developers because of its unwilling security. Java has its own interpreter and compiler and its unique runtime environment too. .net forums

    ReplyDelete