Early Days of Ethereum

Preserving the history and stories of the people who built Ethereum.

ethereum devcon-0: golem

Presentation on Golem, a project exploring distributed computing and resource sharing on the Ethereum network.

Transcript

[00:13] SPEAKER_00: Hi, my name is Peter, I work at IMAPP in Warsaw and we met with Gav in approximately in February I suppose, and we had a short chat regarding our future collaboration and possibilities of us helping you with some parts of the Ethereum project. It happens that I work at the company that specializes in compilers specialized. Right then. And it resulted in LLVM implementation of EVM that Paweł is going to describe. But at the same time we had a little misunderstanding among us. And when Gavin described the Ethereum, our CEO remembered two phrases, Turing completeness and virtual machine. So after, I don't know, half an hour he thought well great, that may be a supercomputer that's something that can distribute computation and I don't know, locally run it. So no, not exactly. It's more about the consensus and keeping a solid compact state, not about distributing computations.

So okay, so what can we do? Maybe we can provide something like this. And he called it Golem from Golem 14. It is a story written by Stanisław Lem. It's a machine that got so large and complicated that it gained awareness of itself. So okay, nice, let's have a world supercomputer. If we cannot do that with Ethereum, we can implement it ourselves.

After another thought we come to a conclusion that it doesn't really make sense because there are some attempts to implement something like this and they do not work so far. Well we have cloud computing but this requires some structure over the machines and hardware and we cannot build a supercomputer or provide computing power out of the box using just PCs, which stay at homes of any people who would like to provide computing power. So what to do? We should give some incentive to them, they should be able to make money and then maybe that would work. But still that's pretty hard because that would require registering into some services that would allow them to transfer value for the computing power that they would like to share. So the obvious idea is that it should be backed up by Ethereum, and that's what we started to think about.

So the main idea is that we would like to provide users of regular PCs or whatever computers, with possibilities of either distributing their computation across a peer-to-peer network or to make some value by providing their resources, either computation or storage or I don't know, bandwidth for that matter. And it should allow them to do it pretty easily on both sides. Especially specifying tasks may not be so easy, but computation should be out of the box without any additional configuration and they should get remuneration for computing something. What's important is that we don't want them to get threatened by malicious nodes inside the network. So all computations have to be separated from their host machines. This is where virtual machines come into role.

And in general the task is some code, well in fact any code that can be written in one supported languages that we can distribute across the network and run inside the virtual machine with some constraints. Because first of all we don't want to threaten that computing node. And on the other hand we wouldn't like to create environment which would allow attackers, hackers, I don't know, whatever malicious nodes they may be in the network, to attack other computers, not necessarily the host one, but some other. We can imagine a scenario when someone prepares a DDoS attack, distributed denial of service, just by sending a million tasks that would try to connect to some server. So this cannot happen.

We started our proof of concept right now. We implemented. Well, sorry for the naughty graphics, but that's how it looks like right now. We implemented a few renderers as a backend and right now we have both open source renderers and commercial production renderers added to the project. And you cannot see this here, but right here there is rendering of four images. The one shown here is the last one. The last one is Mental Ray, I don't know if you know the renderer, it's a commercial renderer and other three are two PBRTs and one V-Ray. V-Ray is also a commercial renderer. On the right panel we see subtasks, which means that small tasks that were distributed across the network and are being gathered into a final picture.

So this is it. We believe that such framework can be used for something more than just rendering, when in fact for any computation that can be distributed. In principle it should work as a MapReduce, computation model maybe bit more generic because we don't want just to distribute and gather results, but to prepare a graph, a cyclic graph of such tasks that can distribute their subtasks, gather the results and based on that results sent another list of tasks to perform other computation. You can use it, we used it for rendering for scientific calculations. In chemistry it can be used for multi-call integration easily, even for Bitcoin mining if someone would like to do it.

The main problem with this approach is that it requires means of validating and collecting results because once it's sent to a network we have no idea who may be calculating and what value those results may have. They may be completely invalid, they may be partially invalid or simply valid. So we need some means of getting the right results and collecting them into the result that we need to use. We can use.

So what they have in common, Golem and Ethereum, because we can use Ethereum as a payment system. But on the other hand there are some modules and parts of the source code that seem pretty similar. In both projects we would require a very good peer-to-peer network module and transport layer, along with node discovery and resource transfer and very good resource sharing and management machinery. Because when you write a program you don't only try to access local files or web servers but maybe other computers in the network and so on. So it should all be worked out. But that's the site of the Golem. Although I believe that some resource sharing would be also required in Ethereum if parts of code are going to be shared across the network.

But from our perspective the most important thing is to use Ethereum as a generic payment system because, the traditional approach looks like this. A user have to register to some service that allows him or her to transfer value and this is a centralized solution requires giving out some private data that would be stored somewhere on some servers and what, I don't know should be a problem. Maybe it's not a problem. We believe it is that microtransactions are not supported when we distribute a large task into the network it may be distributed across a broad subset of this network and many nodes which may be, well, which is very difficult to handle when you use PayPal or similar solutions and should be relatively difficult to implement with Bitcoin for example. I don't know what's the scale of supported granularity of microtransactions in Ethereum, what is going to be. But I believe that it might be much better than the current solutions. We wouldn't like to flood Ethereum with microtransactions so we at some point would have to decide what is the level of at which we can process transactions because if it works out then it may be thousands, hundreds of thousands of seconds. So that wouldn't be feasible I suppose.

That's the another. Well it's nothing groundbreaking neither in terms of the idea. Well although putting it all together seems to be pretty difficult and in terms of Ethereum contracts it's just, I don't know, vanilla implementation of common contract that should take place in Ethereum just used to transfer value between participating nodes. Simple Contract is just a contract between two nodes, one that distributes tasks and the other that provides some resources, for example computational power. Software as a service contracts may involve another parties, for example programmers who provide their parts of code or in case of renderers, companies that may distribute their rendering engines so that company or person also gets remuneration for providing solution in time dependent contracts. For example when you would like to have host something for some time and we have to pay in timely fashion or for hosting this.

So from our perspective this is how it looks. Golem and Ethereum are independent solutions. But to make Golem really attractive it has to be fully decentralized and more important easy to use and easy to install. So forcing users to give out their data, credit card data for example, doesn't seem to make any sense. Well at least much sense. Existing solutions require such registration and they don't seem to make success with this approach. And as for Ethereum, I don't know. This is just a guess but it's quite possible that software as a service approach may attract more programmers towards Ethereum. I cannot guarantee that, but it seems reasonable and that's how it looks.

If you just, if you're not bored with it, I can show you how the graphic outline of the system looks like. So it's not that complicated but it's not easy as well. I don't know if there is anything similar to Ethereum, but we need a solid peer-to-peer layer built on top of some transport layer. We need network adapters for, to provide resource management and global IO that would allow nodes to read and write data in a safe fashion. It has to be really well thought over and implemented. Task protocol is protocol responsible for distributing tasks across the network and gathering the results as well as connecting it with transaction system that's a payment system and thus this layer is all about the peer-to-peer network, sorry, the upper layer is a client layer and the central integration component was called the client before and it's responsible for defining tasks and passing them to the layer that's responsible for computation. We can, below this line we can test it just on a single machine running the task, on a few processes or even threads. So that's good for testing purposes. As you can see there is a virtual machine that is separated by another border because we don't want to give direct access from the virtual machine to external world. Especially you wouldn't like to allow processes that might be written by someone to connect with any computer in the network. Above that we have task definition framework which describes in detail how tasks are specified.

[09:42] SPEAKER_01: Yeah, so you say that the individual instances cannot connect to each other so there are no shared resources.

[09:48] SPEAKER_00: So that there are shared resources. But well I didn't tell that explicitly. The code that is run is just about any code you can write that we can share. But we explicitly have to point which resources should be shared because we can for example try to download some data from a server. And that's not a problem. If you do it with a node when you distribute it over the network and every node downloads the same data that may be a threat to the server. So in this case we can simply force the node that distributes task to download the data and then just distribute the downloaded data to computing nodes.

What's sorting and grouping? Task interface is that interface responsible for creating a graph of MapReduce tasks. And above that we have either user interface not similar to the one that I presented, but the one that should be easy to use and easy to specify tasks and multi programming language support which should I hope allow users to simply write code using some sort of standard library and control structures that are similar to existing control structures and languages. For example parallel four that should be distribute the tasks and programmers shouldn't be really aware of what's going on down here. Okay.

And we believe that there may be many, many, many virtual machine flavors or maybe not technically but in terms of compute capabilities and that virtual machines should be defined by users. Which poses one problem, that this does not fit the whole picture as virtual machines or the configuration fact. Would have to be distributed in a separate channel, not as a part of the application. But that may not be a problem at some point because we can allow users to distribute the data messages that they have such machines and an interested party would be able to download and use it. Okay, I think that's it.