Data Control Made Easy With FluidDB
FluidDB is an openly writable shared database, designed by Australian Terry Jones and his team as a platform for the Web of Things. FluidDB has Esther Dyson and Tim O’Reilly on their Board of Advisors, which is an indicator of the belief in FluidDB’s potential for linking and organising data across the Internet.
We asked this former Circus Ringmaster how FluidDB came about, how FluidDB balances privacy and openness and whether musicians make good programmers.
What inspired you to create FluidDB?
Frustration with how awkward it is to work with information using computers, and the great contrast between that awkwardness and the flexibilities we enjoy in the non-computational world. In the natural world we’re constantly doing all sorts of idiosyncratic, ad hoc, unanticipated things with information, we’re always writing (perhaps in some abstract sense – using our brains), and we don’t stop to ask permission in order to add new information. When we use computers however, the situation is usually completely the opposite. I think we’ve all grown up in this awkward computational environment and so we don’t really see it as a problem. But these things bug the hell out of me. For example, if I use an application to store some data on my behalf, why do I lose control over that data? Why can’t I add to it arbitrarily, delete it, share it, search on it in ways the application didn’t anticipate? Why, when I bump into someone else’s data do I need to ask permission to contribute to it? In the natural world if I want to add my own thoughts to a concept, I can do so without asking permission. I can put post-it notes containing arbitrary information on things, typically without asking permission. Why can’t I do the equivalent when I encounter an object (a file, a web page, anything) in the computational world?
How long did it take you to create this version?
We worked on this version for about 18 months before releasing it. We released as soon as we possibly could. There’s still a lot to be done and we’re improving FluidDB every day. It’s still an alpha. By the standards of modern start-ups, it’s taken an eternity to get this far with FluidDB. But if you consider something like Amazon S3, which has much simpler semantics, it took Amazon years to build and launch. We were just two people.
NoSQL seems to be the trend with CouchDB, MongoDB and many more. What makes people move away from SQL to NoSQL?
There are many reasons (not all of them sound). Some of the simpler attractions are that the NoSQL solutions may seem simpler to deploy (or need no deployment at all), they may be running in the clouds (easy to deploy more machines), there’s certainly an irrational me-too element in which developers follow the latest shiny thing. There are also much more nuanced reasons people might move. For example their needs may not require anything like the full machinery of SQL, or their data model (e.g., a graph – see http://neo4j.org for example) and access needs may not fit particularly well with row-based storage. People might be fed up with some of the real difficulties with SQL, such as the pain involved in altering large tables (compare ALTER TABLE ADD COLUMN with how trivial it is to add a column to a column store. See e.g. http://twitter.com/jzawodn/status/1926136876) Some people run into trouble having tables that are too large and would like to partition / store them in ways that might be simpler with a NoSQL approach. There are companies offering hybrid approaches: column stores with SQL (see http://voltdb.com). There are people whose needs are very different, perhaps according to data value (compare a bank with Twitter), perhaps according to different CAP needs, perhaps according to volume, expected patterns of application access (read/write balance), or transactional needs. Perhaps according to market focus / customer base (e.g., compare data warehousing or OLTP needs with Foursquare’s needs).
There are no simple answers here. As with almost everything in computer science, every decision involves tradeoffs. There’s also a lot of misunderstanding, misinformation, uninformed opinions, and, unfortunately, what seems to be a certain amount of deliberate FUD.
BTW, we try to stay out of the SQL/NoSQL debate. FluidDB stores data into Postgres and S3, it doesn’t implement a storage layer itself. While FluidDB doesn’t support SQL as a query language, we use SQL to talk to Postgres. As outlined above, there are dozens of tradeoffs in these kinds of decisions.
It’s overly simplistic to divide the world in to SQL and NoSQL. Because of that we don’t see much value in participating in a debate at that level, or in being on either “side”.
Why should one select FluidDB instead of CouchDB, MongoDB or any other
You could select one of those technologies based on your particular needs. BTW, CouchDB and MongoDB both have excellent reputations “on the street” amongst developers. Whether you should use one or the other, or something else entirely, depends on your particular needs.
Some FluidDB applications are already available. What else is in the pipeline
In terms of applications, we haven’t really begun to encourage people to build serious apps yet. We’ll start to do that in earnest around the end of the year. In the meantime, we’re concentrating on simplifying and speeding up FluidDB and on some API enhancements that will result in far fewer network calls for typical applications. We expecting to release some small browser extensions by the end of the year, and we also have a much more serious application that we’ve been slowly planning for some months, but I can’t go into detail! :-)
Looking through your team members’ bios, I cannot help but wonder if being an artist is a requirement for using FluidDB.
Being a musician seems to help. Actually it’s well known that many programmers have strong musical skills. Maybe that’s a coincidence, maybe it’s an imagined tendency, or maybe there actually is a neural basis for such a correlation. It’s nice to imagine that programmers may have some kind of redeeming social value :-) I’ve not seen any studies on it.
How do you compare the human method of organizing information to the methods of our existing computer systems?
There are many differences. Humans often have multiple organizations of the same information (think of music software, such as iTunes, which allows you to have multiple playlists over the same underlying objects). We don’t organize things or not organize them – some things may be highly organized and others only loosely so. My (logical) organizations of things don’t interfere with yours, and I can keep them private or share them. We can to some extent combine organizations – what are things you consider to be important compared to the things I consider important? And if you think about it, the act of naming something is an organizational act – humans find it very useful to give things many names (surnames, first names, nick names, names in different human languages), etc. All those names can happily co-exist. Computer systems tend to insist that things are named, and operate on the basis that things have just one name. Note that in this last point I am not necessarily claiming that we need to be able to have multiple names for things (though there are clear cases where it is an advantage), but rather highlighting a big difference. Modern OSes offer some mechanisms for adding names to things, such as symbolic links, but these are a bit of a hacked afterthought.
Is the data on FluidDB available to any user without exception, or is there a way to make it partially private? If so, how secure is it?
FluidDB Namespaces and tags have permissions, so you could have a josette/opinion tag and make it completely private, or give read (or write or delete) access to certain others. So while FluidDB objects don’t have permissions, which guarantees that anyone can add to any object – without stopping to ask permission and without their needs having to be anticipated – namespaces and tags give strong permission. We think it’s very secure.
Because the FluidDB API and permissions system are both very simple, there’s not much scope for ambiguity or confusion about whether one has permission to take an action. Permissions are also well tested by our test suite.
Your website states: “Privacy and ownership are important so FluidDB has a unique yet powerful model of control” but also that “Discovering and enhancing knowledge is vital. FluidDB makes it easy to share, annotate, augment and re-use information”. Are these two statements contradictory
No, not at all. They probably would be in a more traditional storage system. But FluidDB has the powerful combination of openness at the object level (anyone can *add* data to any object) and tight control at the tag level. Several articles have been written about the advantages of this system.
Which kinds of Data wouldn’t you put on FluidDB?
I wouldn’t look at it like that. FluidDB can safely store and retrieve any kind of data (we use Postgres and S3, both of which – network permitting – are fairly bulletproof). I’d ask instead: what kinds of applications shouldn’t use FluidDB? Two clear categories are those with data which need to be always consistent (we do not guarantee consistency – that’s our choice of the missing CAP element, and because we use S3 for some storage it’s an inherent limitation), and those with data which fits the relational model well and which you want to process using the power of SQL. The FluidDB query language is deliberately very simple, so complex processing that might be done in SQL on a server will need to be done locally.
Considering the EU privacy laws, do you see any future problems with your database
If we are successful, I’m sure we’ll have our fair share of privacy and legal issues regarding data. Fluidinfo is a US corporation, so we also have to concern ourselves with US law. One strong hope is that the legal climate regarding data storage will not be changed to make those storing data responsible for policing its content. That’s an impossible and unwinnable task, especially given modern cryptography.
FluidDB is free in its alpha stage but will then move to a freemium pricing model. Could you please elaborate?
We’ve not spent much time thinking about pricing models, but one overall goal is not to be charging individual developers, but to instead be charging companies who are using FluidDB in a commercial capacity. There are a variety of ways for us to do this, though we’re not planning to start rolling them out for at least a year. If (and it is only an if) we do move to a freemium model, you can be sure the upper limit will be generous. One thing for sure, is that it makes no sense at all to reward your early adopter developers by suddenly slapping them with a bill. That’s not going to happen. There shouldn’t be any fear that by building on FluidDB one is setting oneself up for future costs. Grandfathering in early adopters is likely the best way to guarantee this. We have some pretty interesting approaches to making money, and letting others make money, from FluidDB.
Finding funding is always a very difficult experience, what advice would you give to start ups?
First of all, don’t pay much attention to people who have very little experience but who nevertheless extrapolate that into generalized advice for others :-) There are so many types of start-ups, so many kinds of products, markets, and potential funding sources that I don’t think there’s much in the way of succinct context-free advice that can confidently be given. Forced to say something though, I’d recommend that people back themselves (I mean do what *you* believe is necessary, as a product, as a strategy, etc.), only work on stuff you’re passionate about (easier said than done), learn to see rejection as a necessary part of any valuable endeavor, and I’d add that a released product is much more valuable than any number of words (business plans, summaries, pitches, white papers, etc).
The answer is quite interesting. I wrote a first version of FluidDB in C.
It was very fast, very complicated, and very brittle. I threw it all away in favor of Python because I ran across the Twisted project and knew I wanted to use it. We’re using Python because of Twisted. But we’re also fans of Python – it’s very elegant and clean, and has benefited enormously from a very carefully controlled evolution as a language. I don’t agree with all the decisions, of course (surely no-one does), but on balance it’s a language I really love, and increasingly so.
I understand that Tim O’Reilly has now joined the advisory board. What is Tim bringing to FluidDB?
Tim has been an enormous help to us over the last couple of years. He brings a wealth of experience in seeing what works and doesn’t – in terms of ideas, startups, and technologies. He brings the kind of forward-looking long term support we need, because FluidDB is not going to change the world overnight. There is a non-trivial overlap in our visions of the future.
He brings an energetic and idealistic outlook, and an insistence that people work on “stuff that matters”, which is something we believe we’re doing. Tim also has contacts all over the world and has been very generous in introducing us to people, including starting the chain of introductions that led to Fluidinfo’s funding last May.