Blog: Database decisions for the Internet of Things

Database for IoT

The Internet of Things (IoT) is constantly expanding, and the data being transmitted through it grows immensely along with the network. This is why previous and regular methods of data management can’t cope with the growing needs.

Relational databases (RDBs) work well for many scenarios, but this is not one of them. RDBs were designed for processing structured, highly uniform data sets, but with IoT, gathered data is nothing close to uniform. With over 50 billion objects predicted to be connected to a single network by 2020, the variety of transmitted data already ranges from simple text to a complex combination of information from different sensors. Not only does information need to be transmitted, but it also must be analyzed and calculated. Basically, necessary requirements can be divided into three categories:

Diversity of information and sensors – More and more heterogeneous data is generated by an exponentially growing number of diverse sensors and devices. In addition, new sources of data are constantly added, and the structure and scale of that data is always different and complex.

Extensive and flexible systems – The systems that are ruling the Internet of Things have to be flexible and agile so there won’t be need to rebuild an application when new sensors and devices are being added.

Proficient analytics – Previously, simpler systems communicated using alerts and notifications, where information had to be transmitted between two machines, but in the Internet of Things, analytics is the foundation of the system. And for different types of data, there are different analytical mechanisms.

This set of requirements is identical to requirements for Big Data systems, and the solutions to satisfy those requirements are the same as well. According to Machina Research, the silver bullet here is NoSQL databases.

What is NoSQL?

NoSQL has become synonymous with huge amounts of data, linear scalability, clustering, fault tolerance, and non-relational databases.

NoSQL is a series of technologies, approaches, and projects aimed at the implementation of database models with significant differences from the traditional DBMS working with the SQL language. The concept does not deny classic SQL, it only seeks to solve the problems and issues that are not good enough to cope RDBMS. Most often, the data in NoSQL solution is represented as hash tables, trees, and other documents.

NoSQL’s basic idea is that for very large volumes of data, it is difficult to withstand the ACID principle, which is known to all database administrators: atomicity, consistency, isolation, and durability. In these circumstances, “a little” sacrifice of atomicity and consistency is permissible.

In place of the ACID approach, comes BASE: basic accessibility (Basically Available), flexibility (Soft state), and the final alignment (Eventual consistency). This means that for every request guaranteed to finish (even unsuccessfully), the system state may change, even without the appearance of a data in the system, and that data will be compatible, although there still can be discrepancy.

The experiences of large companies showed that no matter how good ACID may sound, it is impossible to provide systems with an audience of millions using outdated principles. Therefore, if you are designing a solution that will load at the level of Facebook or Amazon, you will have to use the principles of NoSQL and appropriate products.

Why NoSQL?

The Internet of Things brings a lot of challenges with its evolution, and one of them is the extensibility of software that is working with it. Each day, new devices, sensors, and approaches emerge in the IoT world, which leads to even more types and amounts of generated data. All of this adds additional requirements for new and existing applications and systems. In some ways, we can say that IoT pushes the industry forward, encouraging databases adopt to new types of data, analytical platforms expanding their possibilities and applications are developing with scalability in mind.

Within two years, relational databases in IoT will become a minority because soon they won’t be capable of realizing the full opportunities available from all types of data. Classic relational databases suffer problems when working with very large data volume and high load, so sooner or later there is going to be a need for distributed calculations. For example, it is often difficult or impossible to achieve isolation and stripping of the data within a single database. According to Brewer’s theorem, any implementation of distributed computing may provide no more than two of the following three characteristics: consistency, availability, and partition tolerance.

sql vs nosql

 

NoSQL, in turn, provides high stability and resistance to the partitioning of data, but is not too smooth with consistency. It is exactly what Internet of Things databases need: the ability to store different types of data and to adapt the underlying data models to new and changing business requirements and applications.

Data Analytics

Another important aspect of the IoT industry is data analytics. The approach here is different, too. While before we only needed to analyze existing data, there is now need for real-time analytics of huge amounts of data. For example, current-generation smart cars generate nearly 25 gigabytes of data from sensors and cameras per hour, with more than 250 gigabytes of data per hour projected for next-generation smart cars. In the Internet of Things, data analysis will require multiple analytical approaches, and, in some cases, significant value is achieved from real-time data analytics or from historical analysis for predictive maintenance services.

As we now know, the IoT generates essentially more data to be stored and processed, and even cloud solutions typically don’t meet sufficient capacity. This is happening due to the nature of cloud-based services: they usually store all data on a single server, while IoT requires more distributed databases.

One of the best solutions to resolve this bottleneck is to setup a data cluster with a scalable number of nodes with optimized data storage to lower the latency of requested data.

All these aspects are essential and critical for Internet of Things applications. In the whole history of data management, databases were used to store simple and linear information that could fit in rows and columns, but in the case of IoT, we can’t predict what kind of information we will need to receive and process tomorrow. With new types of data appearing and new sensors being invented, the usual systems must evolve.

Conclusions and recommendations

Big Data is a prerequisite for the development of the Internet of Things. Without proper data collection, businesses will be deprived of the possibility to sort the information coming from the built-in sensors. This means that without all the possibilities of the Internet of Things, Big Data will be nothing more than white noise.

In the world of constant technology progression, a key element of any IoT application development strategy must be database management. Different applications, by nature, have different requirements. This is why choosing the appropriate database type is a vital component for ensuring success.

NoSQL Review