From MySQL to MariaDB, Michael “Monty” Widenius talks about databases and his projects.
As the number of new databases is growing every day I wanted to know more about their differences and usage, and the buzz words: SQL or non-SQL. Who should I ask but the founder of MySQL and MariaDB, the charismatic Michael “Monty” Widenius?
I keep hearing about SQL and non-SQL databases. What are the differences and why should I use one type rather than the other?
The main differences (for normal people) between these are:
The advantages of non-SQL:
- Non-sql allows you to store ‘unstructured data’ (i.e., you are not restricted to having a fixed set of columns per row).
- Many of the non–sql databases trade replication and scalability against speed; you get (in many cases but not all) a higher speed but for some noSQL implementations you have a higher risk of losing data when something goes wrong.
- You don’t have to learn SQL.
- Another advantage that ties in with Point #1 is that it’s usually schema-less. A lot of people already find it hard enough to write schemas so No-SQL jives with them.
The disadvantages of non-SQL:
- You can’t combine data from different tables or databases without writing a program.
- You can’t easily export/import data from other sources.
- There are no standard interfaces so you can’t use our non-sql data with any standard application.
- All the non-sql databases are totally different. If one doesn’t work for you there is a lot of work to move to another.
For most applications and users SQL is fast enough, which makes the main reason for using non-sql the more flexible schema.
What we have done in MariaDB is add ‘dynamic columns’ which allows you to get a similar freedom to the schema as you get with non-sql.
Another method of No-SQL available in MariaDB is HandlerSocket – you can find out more about this at the Knowledgebase – try search: handlersocket).
In future releases of MariaDB there will be more no SQL styled interfaces.
MySQL is the most used client/server SQL database with 20-50 million users. As Oracle is now moving MySQL to Open Core, those users will need a free alternative so they don’t have to start paying for their software. This is why MariaDB is needed.
Non-SQL may be trendy, but to be fair: non-SQL has always existed and the main reason for the trend is a few blogs from some popular web sites that have extreme load. Even these sites have no plans to dump SQL, they are just using non-SQL for some edge cases.
Do you know the market share of the top 5- 10 SQL databases?
This is roughly the distribution when looking at the number of users:
- Oracle 80 % (of which MySQL 50 %)
- MSSQL 10 %
- DB2 4 %
- PostgreSQL and others make up the rest.
Are these databases specific to certain jobs or do they all do the same thing?
Every database has different sweet spots. (I will only mention some of the sweetspots below. Doing a full database comparison would be a very long article).
- Oracle is very good when it comes to massive parallel queries.
- MySQL (base version) is extremely good for web loads and can do more database connections/second than any of the other databases. It’s also a proven and configurable replication which makes it one of the main choices for a cloud database.
- MySQL Cluster is one of the best databases for real time operations and is used heavily in the telecom industry.
- MSSQL is a great all round database, but only runs on Windows.
- PostgreSQL is probably the best database if you need to extend the database with new structures, types or properties.
How different is MariaDB to the other databases?
MariaDB is a full compatible and binary drop-in replacement of MySQL.
To convert from MySQL to MariaDB takes seconds; you don’t have to convert any data and all our old connectors to other languages work unchanged.
MariaDB has 10 years more development than MySQL and is generally notably faster and more stable (as we have spent a lot of time fixing bugs in MariaDB that we have inherited from MySQL). The full list can be found here.
Can you give me several reasons why I should move from MySQL to MariaDB when I would have the back up of a huge company with MySQL?
- MariaDB is created by the same people who created MySQL. Oracle may own the MySQL code and trademark but they have lost most of the know-how about the MySQL code and are not able to develop all the parts of it anymore.
- If you have a lot of updates and replications, MariaDB is a magnitude faster thanks to the new group commit code.
- Better optimizer; big data and sub queries are handled MUCH better than before.
- MariaDB uses by default XtraDB (an improved version of InnoDB) which is both faster and has more features than InnoDB.
- MariaDB is Open Source (not Open Core as MySQL is today).
- More features, less bugs.
- Progress reporting (so you know how long things like ALTER TABLE will take).
- Support is less expensive and better than Oracle (as most of the original MySQL support people have moved to companies like SkySQL that also supports MariaDB).
- You can pay for extensions to MariaDB in order to ensure that your company’s future product needs will be in the next release of MariaDB.
How many developers are working on MariaDB? What’s their background?
At Monty Program Ab we have 17 people working on and with MariaDB. Of these 12 are developers, most with a long MySQL background. Together we have more than 100 years of competence in the MySQL code base.
You can find a list of the people here.
There are also a lot of people and companies contributing code to MariaDB. You can find most of them on the maria-captains (27 members) and maria-developers (175 members) lists on launchpad.
Another point to note about maria-captains (i.e. people that can commit code to the repository) is that some 55-58% or so come from Monty Program, while the rest are people and companies that are allowed to commit code as maria-captains. This is important because MySQL has never had success with a community of developers.
Can a developer work on several different database projects at the same time?
Yes, but seldom on different server code bases, as the code is complex and it’s hard to move things between different projects.
However, it is quite common for people who are writing connectors (like PHP, Perl) to be working on connectors and tools that work with many databases.
How would you reply to Baron Schwartz’s comments in his broad-vision keynote, as reported by Andy Oram, that “databases are starting to need to handle petabytes. And he criticized open source database options as having poorer performance than proprietary ones.” From “Wrap-up of 2011 MySQL Conference” by Andy Oram.
In some sense he is right. Very few databases can handle petabyte data. Regarding MySQL I know of only a few projects with data in that range.
In 2007, NASA gave a keynote at the MySQL Conference. It talked about how they were handling petabyte data then with MySQL.
The main problems with data in this range are:
- It’s hard to handle crashes (you don’t ever want to have to repair a database in the petabyte size).
- We need better optimizers with new algorithms. In MariaDB 5.3 we have done a lot of work to be able to more gracefully handle data in terabyte ranges, but there is more to do to be able to go to petabyte.
- We need better ways to distribute queries over many data nodes that can process parts of the data individually.
Some MySQL storage engines for handling big data in parallel do exist. These are (in alphabetical order) CalPoint, Infobright, ScaleDB and Tokutek (sorry if I missed anyone).
We are also working to add a new storage engine into MariaDB that can handle data at least in the terabyte range and should be easy to scale in the cloud. More about this later this year…
How has the cloud changed the development of database techs these days?
Some of the main requirements of a database in the cloud are:
- Should use very little resources / user and instance.
- Should be fast and lightweight.
- Should be easy to scale by just ‘adding a new machine’.
- Should be easy to manage many different types of users within the same database instance.
MySQL fits most of the above, except that for the two last entries, there is still a lot of room for improvement. We are working with some of the cloud vendors to improve upon this.
I don’t have any plans to have any more children and as MariaDB is a ‘never ending story’ I don’t think there will be time for yet another big project.
Thank you Monty for your time and knowledge. I hope we can continue this conversation in the future with an up-date on MariaDB.