All about PostgreSQL – the world’s most advanced open source database
Simon Riggs is the Founder and Chief Technology Officer of 2ndQuadrant. He is also a major developer and committer of the PostgreSQL project. Simon has contributed major features in each of the last six versions of PostgreSQL. His work includes recovery and replication, performance and monitoring as well as designs for many other features.
I met Simon and some members of his team during the DevCon and Jax Conference in London. During the social meeting, we talked about databases and my ignorance of PostgreSQL. After spending so many years in Tech Publishing, I always thought that even though I do not understand the technologies, I have a fair idea of what they are about but then found out that I knew practically nothing about PostgreSQL…
What is the history of PostgreSQL?
Postgres started as a research project at University of Berkeley, California in the late 1980s. Mike Stonebraker’s research team went back to basics to design the datastore of the future, including items like user defined datatypes, lock-avoiding concurrency and a DBMS designed for multi CPU architectures.
That project was eventually commercialised as Illustra, but that’s another story. The main event here is that the original Postgres code was BSD open source and so various pioneers took on the code and started fixing the bugs, and adding new features.
The community has grown simply because of the project policies:
• There isn’t just a single company involved, it’s a community, so everybody benefits from helping get new people involved.
• All contributions get credited. You can see who did the work, so your efforts are rewarded.
• Developers answer users’ questions on mailing lists and IRC, which helps new adopters and encourages people to ask challenging questions and report bugs.
I am surprised to hear that PostgreSQL is 20 years old and is regularly updated, when so little is known about it outside of the community. How do you explain this lack of noise?
The lack of noise is because of the lack of fully funded marketing departments.
In terms of “PostgreSQL awareness”, if I go to an open source computing conference and ask “have you heard of Postgres?” almost everybody has. Some people know us as “Postgres”, some people know us as “PostgreSQL”. Both names are used, even though the official name is PostgreSQL. So the tech staff have heard of us, it’s just non-technical management that may not be aware. But you don’t need 100% awareness to be short-listed, and we’re certainly popular with CFOs!
And in the last year, the question “do you use Postgres?” now gets more than 50% success as well.
I understand that PostgreSQL is updated continuously – who does the upgrades? If you have a lot of developers working on the project who decides which work to integrate in the product?
PostgreSQL does a major release once each year.
The development process is about “open engineering”. We discuss a need, then design a solution to that need, then implement it. Then we peer review it, rewrite bits and bring the code up to rock solid quality levels. People often specialise in particular parts of the process, some people have great ideas, others have great designs. Then we have great coders, great reviewers and thorough testers. The driving force is about “can we make that better?”. For the contributors, there’s serious kudos in being able to suggest a better way. It’s mostly fairly polite and since people give credit, it encourages everybody to help.
There are very few actual “committers” on the project – people who can add code to the repository. Many people develop features for PostgreSQL, typically about 100 authors in any one release, contributing a few hundred features. Most of the bigger features are written by a smaller group of major developers, not all of whom have reached committer status yet.
In terms of “who decides” we work on a consensus basis. If somebody has a reasonable and rational objection, then we need to take that into account. Sometimes that means patches are rejected, but often it means sections of code are added or a feature changed slightly.
The dev process works well, so we end up focusing on the features people really want and make sense, not wasting time on “marketing features” – ones that sound great but aren’t properly implemented, or that nobody would ever use.
Why should one use PostgreSQL instead of Oracle or MySQL or any other competitive databases?
Well, first, price. PostgreSQL is completely free and the free version is also the latest and best version.
Second, cost of ownership. PostgreSQL follows the SQL Standard very closely, so you’ll be able to reuse skills easily from other databases.
PostgreSQL is also low on DBA overhead, with many people reporting having used it untouched for years. High volume and complex databases need more love, its true, but we use the hardware efficiently and don’t waste people’s time with pointless optional parameters and manual tasks.
With PostgreSQL, “it just works” we say. We add many advanced features, yet they work in seamless ways so you can use them easily. For example, almost all database definition SQL (known as DDL) is transactional, so you can use those statements anywhere without having to think about transaction boundaries. The PostgreSQL TEXT datatype is optimised for anything between 1 byte and 1 Gigabyte column sizes, yet they all work the same – no need to use a different datatype for > 2K or > 32K. What’s even more important is that the underlying code is well designed and well documented, so that we don’t spend months refactoring at the start of each release.
Third, or maybe this should be first? Advanced features. PostgreSQL has an advanced optimiser, with many different types of join plan and 4 different types of index:
• Array types that really work and are indexable.
• Server-side programming languages such as PL/pgSQL, plus options to write stored procedures in Python, Perl, java, tcl, R, Ruby etc..
• Extensible datatypes and operators that allow implementation of full text search, high performance GIS systems and many other datatypes.
• We also support recursive queries, windowed aggregate queries, plugins of all different kinds.
Fourth, the code is robust, with very few security flaws and bugs.
And lastly, it performs very well, with some reports of people getting better performance after upgrading from commercial databases.
With Oracle buying MySQL, do you feel that PostgreSQL will/has gain market shares?
PostgreSQL has always been favoured by people that come from an Oracle background. It’s very similar, with advanced features, non-block locking and similar stored function languages.
People often ask the “MySQL question” as if somehow PostgreSQL and MySQL have anything in common. Apart from being open source, the two projects are radically different in terms of organisation, typical use cases, feature set, code quality, code licence and others. PostgreSQL is the database DBAs choose, if they can.
MySQL is a little confused now. It used to be this fast, mostly read only database. Now there are lots of different forks, all doing slightly different things. It appears to be able to do anything: it’s advanced, it’s simple, depending upon which fork and which plugins you use – but you can’t use them all at once. But you’re right – if anybody was using MySQL because of its feature set, they’re certainly coming to PostgreSQL now because of the licensing situation.
PostgreSQL does one thing well: General Purpose read/write SQL database.
That means you can throw all kinds of database designs at PostgreSQL and it copes with them well. Why? Because when we implement a feature, we implement it fully and we tune it as well. The PostgreSQL community is working together in one direction – to enhance a real-world database that works effortlessly and efficiently on a range of different workloads. So if you pick PostgreSQL for a project you can be sure you won’t be limited by feature set, and the price, performance and robustness are all there too.
It’s not really surprising there is a huge growth in usage in recent years.
I keep hearing about NoSQL databases. What are they?
Well, I see NoSQL databases as providing these main things:
• A stripped down database that focuses on speed above all other features.
• By simplifying the range of actions possible, NoSQL databases are able to scale sideways much more easily than databases running more complex workloads
• Many also provide the ability to add new data elements easily without issuing data definition statements
Those are all valid lessons. But they apply equally to any datastore, so they apply to PostgreSQL as well. What I mean by that is that with the right set of options, PostgreSQL is very fast as well. And with the right architecture, an application can scale on any product, so I don’t really see the point in deliberately removing features you might need in the future.
How come CouchDB, FluidDB, MongoDB etc. are so much in the news at the moment?
New things are sexy. Nobody wants to hear that the way your Dad used to manage data is still the right way, in most cases. Especially when it turns out that what you just suggested isn’t new at all, and that your Granddad used to manage data that way too before he gave it up.
Is there a relationship between the growth of NoSQL databases and the growth of social media?
Yes, but not all social media applications run NoSQL. Far from it.
In the past, IT was mostly about billing and number crunching. Your electricity bills, airline tickets and banking transactions. In recent years we’ve seen a move towards datastores keeping track of lower valued information, like web clicks, and general internet usage data.
That has meant a couple of things:
• “Mostly read only, very fast response” became an important type of application for databases – where MySQL became famous.
• Massive scalability is now a requirement for top applications in many companies. But just because Facebook uses XYZ for its main site, it will still use other types of database for its other applications.
• Rigid control over data quality is less important than the ability to bring new features to market quickly and easily. With apps available 24×7 that means down time is nearly impossible. For major apps, we also need more than a few hundred data columns.
To which PostgreSQL has responded. In version 9.0 there is a new version of “hstore” which allows you to store many attribute=value groups for any key, allowing flexibility without downtime. You could already add or remove columns in seconds even on large tables and add new indexes concurrently. In 9.1, we’ve reduced the lock strength of many ALTER TABLE commands, so you can add foreign keys and partitions without blocking SELECTs.
PostgreSQL has for years had the PL/Proxy framework for infinite scalability, as used by Skype and MyYearbook.com. Essentially this is just the same architectural rules as the NoSQL folk, but using PostgreSQL, so you can take advantage of the other features as well.
How do you make a living out of PostgreSQL?
I’ve worked with databases professionally for more than 20 years, so becoming a PostgreSQL professional was more about switching from using other databases. In 2004 I started doing PostgreSQL training and things took off from there really. Soon, I was full-time solely on PostgreSQL.
Personally, I do a lot of work for companies that want additional features into core PostgreSQL, or various kinds of plugins. My company does all the things our customers request: 24×7 support, remote DBA, tuning, training, configuration, systems management, high availability… there’s always a need for expertise.
In September, PostgreSQL version 9.0 was been released, what are the most prominent features in the new release?
• Streaming Replication
• Hot Standby
• Join Removal (for ORMs)
• Default Privileges
• 64-bit Windows support
• New security features
• In-place upgrades
• Improved user defined functions in perl and Python
…Safer, faster, easier to use
How do you explain that so few books have been published on PostgreSQL?
That’s down to the publishers, I guess. I’ve been trying to write an advanced book on PostgreSQL for 5 years. Finally, we’ve published these two books.
• “PostgreSQL 9 Administration Cookbook” Simon Riggs and Hannu Krosing ISBN 978-1-84951-028-8 (PACKT Publishing)
• “PostgreSQL 9 High Performance” Gregory Smith ISBN 978-1-84951-030-1 (PACKT Publishing)
Both of those books are selling “very well”, I am told by those that know.
Most of the books I found on Amazon are mostly over 5 years old.
True in English, but Germany and France have had good new books in recent years. It’s down to publishers more than authors or readers.
PostgreSQL is mature, so yes, books have been written over the years.
The software is changing fast, so many of the older books are unfortunately out of date in many respects.
Can you mention some of the big users of PostgreSQL?
NTT, Skype, Afilias, Nokia, …
… there are many more, mostly hidden by non-disclosure agreements. Put it this way: 2ndQuadrant’s 16 technical staff are pretty busy, and we have many large customers.
Almost all providers in the Telco space use it, since they have deployments of hundreds or thousands of servers. That’s both service providers and equipment manufacturers.
Many companies get big discounts from other database suppliers. Many of them aren’t willing to risk losing that discount by voicing support for PostgreSQL, especially since we can’t offer a discount on the licence – the database is already free.
I hope this post has highlighted some of the key features of PostgreSQL but please feel free to make comments, ask questions as I am sure Simon will kindly answer your questions.