The roots of the blockchain file systems and version control

File systems have always been a secluded area in the heart of the author. He studied programming language scripting shell Unix, automating the installation of Disksuite is free, but sadistic ON the disk mirroring from the Sun. Finding it difficult to remember what exactly was his job, he remembers, as it were. To learn programming, he had to travel in the literal sense of the word – he visited a friend, who clearly took pleasure in pointing out mistakes.

When the Sun began to advertise its ZFS file system as a (long-awaited!) successor Disksuite and its file system is UFS, then a large part of the functionality seemed to be clearly effective: the system allowed computers to control the drives, users are not required from the beginning to know its size, it is not destroyed in the crash of the server. In General, such pleasant trifles. But the question is – how was ensured integrity of system data? The author of ashamed to admit it, but he did not realize that needs this feature – it is interesting that the file system is efficient in the issue of data storage, right? And even more time was spent on figuring out how it works.

To explain this, the readers need a little lesson in cryptography. They can skip this part if you already possess the relevant knowledge. As a rule, the training of cryptography starts with the instructions: «to Obtain a master’s degree in mathematics at the Massachusetts Institute of technology». In fact, you can go for a slightly shorter way. Cryptography is simply a kind of mathematics. Although most difficult to understand in this area thoroughly, we can at least understand the functional graph algorithms. When people talk about the futility of trying to ban cryptography, they mean the following: «it is impossible to deny the math».

Cryptography is best known as a tool that ensures privacy: it is a guarantee that only the user who resorts to its aid, has access to files and private messaging in chat rooms. The task becomes somewhat more complicated when there is a need to read them on all devices at the same time, however, the concept is virtually unchanged. It is even more effective in a situation when it allows you to simultaneously read the text of the two elected persons but not the third. This is a more complicated scheme, but is essentially a continuation of the first.

Privacy is not the only scenario the use of cryptography. It is also effective in verification. For example, you can use it to check whether an identical file that is available to the user, what was there yesterday. For example, the user who sent the file, the question arises – how can he be sure that the file has not changed in the process of shipment?

Of course, the solution could be simply resending the file. This solution is not ideal, since it implies confidence in the authenticity of the file from the beginning. In addition, such a model is bad, if bandwidth is expensive. Ideal – it is the verification mechanism that occupies less space than the original file, and requires less CPU power than those needed for direct comparison of the two files.

Cryptography provides just such an opportunity, commonly referred to as «hashing function». This is the algorithm that turns, say, a large text file in a much shorter number of characters. To make sure that the file has not been altered, it is sufficient to perform the reverse conversion short version and compare the result with the original. Short line to compare easier than long documents. They can even be read on the phone the person that checked the file. Typically, these algorithms create a string of fixed length, regardless of the amount of input data. Thus, they serve as effective long-term storage of data and their comparison, and can safely convert the file of any size. An example of the result of the hash function:

03f39f4bfad04f6f2cfe09ced161ab740094905c

As you can see readers, this is just a long string of mumbo-Jumbo. It allows easy comparison of two files. Another advantage is the fact that this set of characters devoid of meaning.

A critical feature of these algorithms is their ability to consistently provide a unique output at a unique input. If two people have a file that is hashed to a particular row, they can both be sure that it is one and the same file. Of course, not in the literal sense of the word: you can create a hash function with only 256 possible insights, whereas possible inputs, there is clearly much more. As a result, numerous conflicts arise when two files hash to the same conclusion. Alas, such a scenario is of little use.

All modern hash functions is an incredibly long. Although conflict is possible in theory, in practice it is unrealistic. Perform the function 212⁸ times. It’s 3.4 with 38 zeros. Thus, it is mathematically possible, but soon the sun will engulf the earth than the reliable hash function will suffer. In other words, it is unthinkable, that is, the files will be safe.

Now, when readers were not as well-versed in cryptography than most of the «hollero» bitcoin, the question arises – why is all this important?

It was about the integrity of the data.

Readers can’t go wrong, considering that the ZFS file system uses these hashes to verify the integrity. It is capable of more than just the verification the individual files.

The key is a cryptographic genius fraction – a phenomenon called Merkle tree (hash tree). In this model, content is not just hashed on disk for further verification, but it creates a tree of hashes to sheet the top of which is placed the hashes from the data blocks, and the interior nodes contain hashes from adding the values in the child nodes. The root node of the tree contains the hash of the entire data set. If any part of the system goes bad damaged disk or someone changes the content – this fact is easy to recognize. Change is not only an individual hash, but be incorrect and all the parent hashes of all the child hashes.

If the content is changed by any mechanism that also does not update the Merkle tree, it can be readily seen, again hashiru all of the content and comparing the results with the stored tree.

So ZFS verifies the data. She can write block to disk remove block and check whether it is still hash. When the system writes a block, it updates the parallel tree. If you later ask the system to provide the unit, it will indicate whether it is authentic. If not, the system, instead of having to return the unit, will report an error.

Maybe it’s overkill, but it’s worth remembering how many ways data corruption.Fairly common – the distortion of the data for criminal purposes, but much more common error in writing or reading. Old rotating disks were unreliable, and the new solid state drives over time are destroyed. The main issue is their excessive sophistication of the process of reading and writing, but there is also a risk of damage to the many levels of cache, drivers, and connections.

However, ZFS for the first time in the framework of the industrial file system at least provides an opportunity to recognize any of these problems. It is regrettable that no one ever used it before. Of course there are people who like and use ZFS. But not to the extent where he should be.

The author is well aware that readers expecting to learn about wonderful opportunities to take advantage of the blockchain without the blockchain. Instead, he lectured on two phenomena, which the readers of the lantern: cryptography and file systems. But do not worry — it will be even worse.

After a long time after the author learned and immediately forgot ZFS (in the end, he hasn’t used it), it has adopted a Git. It’s a distributed version control system used to store and manage the code.

All decent programmers have long knew about it, but the masses were introduced to the system only recently, when Microsoft acquired the Github for $7.5 billion. The author was one of the early users – in 2008 he switched from Puppet to Git. He pleasantly tickled nerves and a little frightened by the fact that he was able to play in Puppet one of the key performance Git: a file storage system, which allowed to find their content (or hash of content). Usually files are saved by name, but if a lot of people (or, as in the case of Puppet, computers) store the same file, you can not give it the same name. Accordingly, Git and Puppet, kept files using hash. Thus, there was a guarantee that users don’t copy (keep) more than one instance of the file, saving a lot of space. In addition, this model simplifies the task of checking file changes. Within Puppet using this model simply duplicated the files to be modified, in case someone wanted to revert to the original version. However, Git was capable of so much more.

Like ZFS, it builds a Merkle tree for the entire file repository, with the same goal: to understand which files have changed, and how. In the end, Git is used to track changes and their transmission to the collection of files. A critical component here is to share files; the user can easily copy the entire Git repository to another computer, or transfer them to another user. It is important that they were able to confirm the availability of authentic copies.

Git stores the hashes of the tree along with all the files. At any time you can use the tree to check for any file in the tree. If there are changes, the version control system can automatically save the new files and update the associated tree – in fact, this is the main advantage of the system.

As in the case of ZFS, one of the key characteristics of the system is that the Merkle tree allows you to check each stored file. You can go through all the tree files and compare each file with its hash, and then compare the listing file with its own hash, in ascending order. And any failure is easy to recognize.

This set of characteristics are impressed most of all: the system is simple in terms of implementation, flexible and efficient. She has a power that other version control systems, just because it is based on the storage mechanism and authentication. Here we come to the crux of the issue.

It is easy to consider the blockchain as sudden a revolution as a dramatic change in what is possible. When considered so, it is difficult to separate parts from the whole. If the field of view is only the global picture, it is easy to lose sight of the fact that each individual component has its own history and value.

The blockchain actually occurred gradually. He wasn’t one giant leap. He was part of a plot, sequence. The most interesting aspect of it – the Merkle tree is based on mathematical research, which has a history of decades. Now, even broad masses of users because of this aspect, contact with the old Testament math. Many of the most interesting – and actively promoted – the characteristics of the blockchain is the Foundation of this math. Immutability and absence of need in the trust directly come from her.

However, unlike the blockchain as a whole, the individual technical components used under operating conditions for years, even decades. Focus on the current trend may result in blindness in relation to the opportunities demonstrated by history. The author believes that instead of trying to replace the currency, better turning to history to find solutions with wider application.

Since the author comes from the world of file systems and version control systems, he sees other benefits than the obvious, when viewed through the prism of the currency exchanges or instant messengers. And maybe not only the past but also the future of the blockchain — not in trade and Finance, namely in the technical infrastructure?

About the author: Luke Kanis entrepreneur, consultant, strategist and journalist. In the centre of his interests – increase inclusion and productivity and work with the creators of the projects.

Добавить комментарий