Infobright – Tackling Big Data Analytics without Breaking the Bank…

Infobright was founded in 2005 to leverage a mathematical approach, called Rough Set, to solve data management and analytic problems.  Rough Set started with a Polish computer scientist Zdzislaw Pawlak in 1981 as a mathematical tool to deal with vague concepts.  In his theory, Pawlak describes the lower and upper approximations of a set as crisp or conventional, defining the upper and lower boundaries.  This approach is useful for rule induction from incomplete data sets, and in other variations, also helped to form approximating sets known as fuzzy sets.  Dominik Slezak, one  of Infobright’s founders, explains this as “we follow the rough set approach to identify: (1) the data portions that are fully relevant to the given query execution; (2) the data portions that are fully irrelevant to the given query execution; (3) the data portions that remain undecided.”  This theory when applied to data mining and machine learning was quickly adopted by a variety of industries with different applications.

Infobright’s founders realized that Rough Set is a powerful tool to enable fast queries against large data sets without doing all the database administration work that had always been a requirement in the past to achieve fast performance.  Instead of requiring indexes, data partitioning and other typical techniques, intelligence in the software could drive performance.  Infobright calls this intelligence the Knowledge Grid.  Wedding the Knowledge Grid with a columnar database architecture produced a powerful solution that could handle large amounts of data fast and simply, at a very low overall TCO.

In 2006 Infobright formed a partnership with MySQL, taking advantage of the “storage engine” architecture that MySQL had to encourage other companies to create new databases for different use cases while taking advantage of many MySQL functions.  This integration meant that migrating from a row-based MySQL database to the Infobright columnar-based database would be as simple as a command line change.

Infobright first introduced its technology, then known as Brighthouse, at the 2007 Rough Set Conference in Toronto to a very positive reception.  A few months later, in 2008 the company released the industry’s first commercial open source analytic database software (infobright.org) and started building a strong and growing open source user community, with more than 15,000 downloads in the first year.  Within a year, Infobright had more than 40 customers including ISV OEM customers who embedded Infobright’s solution in their own software offerings.

Since 2008, there has been a lot of other changes.  For one thing, the product is now called Infobright not Brighthouse.  There are tools to integrate with major BI partners such as Pentaho, Jaspersoft, Talend and Informatica.  Users can load data several different ways depending on their needs – using MySQL loaders, distributed loading, or many other ETL tools.  Customers have reached data load speeds of up to 200,000 records per second.  Infobright positioned themselves as the leader in the open source data warehousing community, and soon after, for recognition of their outstanding contributions to the MySQL Ecosystem, Infobright was awarded the prestigous MySQL Partner of the Year award by Sun Microsystems in April 2009.

Using built-in intelligence, Infobright’s unique way of storing and analyzing machine-generated data has provided the vehicle to near real-time analytics in big data.  Machine-generated data has become one of the fastest growing categories of big data, with sources ranging from web, telcom network and call-detail records, to data from online gaming, social networks, sensors, computer logs, satellites, financial transaction feeds and more.  This focus on machine-generated data within big data, beginning in 2010, gave rise to the rapid increase in customer momentum. And our latest version, released last summer, included Hadoop connectivity, as well as the introduction of Domain Expert and Rough Query. Developed exclusively by Infobright, Domain Expert uses specific intelligence about machine-generated data to automatically optimize how data is stored and how queries are processed.  Rough Query leverages our Knowledge Grid to deliver data mining “drill down” at RAM speed, otherwise known as “Investigative Analytics.”.

From the beginning our executives and engineers recognized that the looming database challenge was how to analyze and extract actionable knowledge from very large (and growing) data sets.  Clearly the market agrees as terms such as “big data” and “machine-generated data” become more commonly used.  In addition, companies appreciate that Infobright’s approach – to work smarter not harder – means that their users can get fast query response – even to ad hoc queries – without a high overhead of database administration or hardware costs.

In 2011, eight of the top ten telecommunications service providers worldwide used, and continue to still use Infobright to mine their big data through our OEM partners.  Hundreds of customers use Infobright daily and more than 100,000 users have downloaded both our community and enterprise editions.  Infobright is still leading the industry and paving the way.

By Craig Trombly

Infobright Open Source Community Manager