Building Artificial Intelligence in Web3

The applications of Artificial Intelligence are everywhere in web2 world. There are tons of artificial intelligence models built and being built to make our life better, or to make applications that align with the interests of large tech companies, like Google and Meta. Those models rely on two key components: user data collection and centralized services. Large tech companies eagerly collect your public and private data and send them into centralized services which produce artificial intelligence models to empower their applications.

Not coincidentally, a new era of the internet - often called web3 - which is built to destroy user data collection and centralized service. The builders of web3 are pursue self-sovereign ownership of data and decentralization of applications, which requires a paradigm shift of building artificial intelligence.

Nobody can predict what the new paradigm of building artificial intelligence is, but I can imagine there are at least the following principles:

User owns their own data

Events like a user viewed a profile of another user in a social media or a user watched an ads for a few seconds are typically collected by frontend or mobile apps and sent to a centralized service today. In web3 world, those data can be collected but the access of those data should only be controlled by the user that produces them. The user should be able to view, share or authorize third-parties to make use of the data without compromise privacy. A potential solution of this is Decentralized Identifiers (DIDs).

Running private, trusted and potentially resource-intensive computations

An important step of building artificial intelligence is running resource-intensive computations (typically on a GPU) to make the model smarter by learning from large-scale user data. Recent advances in machine learning allows decentralizing this step by running the computation on each of the user’s data and then combining all the results later. With this approach, we run the computations on user’s private data vaults in a way that no privacy information leakage.

A software framework to help define computations and orchestrations

Software frameworks like TensorFlow are created to help develop artificial intelligence models for web2 world. An important assumption of frameworks like this is that there is a centralized dataset that the AI model can access to anytime with no network and communication cost. We already know it’s not true for building AI in web3. Furthermore, in decentralized machine learning settings, we not only need to define the model, but also need to define the orchestrations of the model among user’s private data vaults. A new software framework is needed here.

Ownership of computation results

The concept of “computation results“ we’re referring to here are artificial intelligence models, or analytic results if you’re doing data science on users’ data. Today the computation results are owned by large tech companies and that’s the main place for them to extract values from. Users contribute your Amazon order history to build a recommendation model that only makes you spend more money on Amazon. The ownership of computation results should give back to users to incentivize them so that more values can be produced.

Innovation applications

People often say in web3 world that only the infrastructure is ready, applications can be built. However, without the innovation of applications, the infrastructure does not make sense. We’ll need a big wave of creativity and entrepreneurship to build an application that can not be done today - protecting user privacy and building artificial intelligence models that make our life better.

Apparently not all of the above principles can be achieved by the current stage of web3 development, but I believe without any of these principles, we can’t build a truly web3-native artificial intelligence.

Feel free to reach out - twitter: @swkcoding, gmail: swkpku@

Weikang Song