IOSG Ventures: LLM empowers the blockchain to open a new era of on-chain experience

2023-08-08 05:08:57

Written by: Yiping, IOSG Ventures

write in front

As the large language model (LLM) is becoming more and more prosperous, we see that many projects are integrating artificial intelligence (AI) and blockchain. The combination of LLM and blockchain is increasing, and we also see opportunities for artificial intelligence to reintegrate with blockchain. One worth mentioning is zero-knowledge machine learning (ZKML).
Artificial intelligence and blockchain are two transformative technologies with fundamentally different characteristics. Artificial intelligence requires powerful computing power, which is usually provided by centralized data centers. While the blockchain provides decentralized computing and privacy protection, it does not perform well in large-scale computing and storage tasks. We are still exploring and researching the best practices of artificial intelligence and blockchain integration, and we will introduce some current project cases combining "AI + blockchain" in the future.

Source: IOSG Ventures

This research report is divided into two parts. This article is the upper part. We will focus on the application of LLM in the encryption field and discuss the application landing strategy.

What is LLM?

LLM (Large Language Model) is a computerized language model consisting of an artificial neural network with a large number of parameters (usually billions). These models are trained on large amounts of unlabeled text.

Around 2018, the birth of LLM revolutionized the research of natural language processing. Unlike previous methods that require training a specific supervised model for a specific task, LLM, as a general model, performs well on a variety of tasks. Its capabilities and applications include:

Understanding and summarizing text: LLM can understand and summarize large amounts of human language and text data. They can extract key information and generate concise summaries.
Generating new content: LLM has the ability to generate text-based content. By feeding it to the model, it can answer questions, newly generated text, summarization or sentiment analysis.
Translation: LLM can be used to translate between different languages. They utilize deep learning algorithms and neural networks to understand the context and relationships between words.
Predict and generate text: LLM can predict and generate text based on context, similar to human-generated content, including songs, poems, stories, marketing materials, etc.
Application in various fields: large language models have wide applicability in natural language processing tasks. They are used in conversational artificial intelligence, chatbots, healthcare, software development, search engines, tutoring, writing tools, and many others.

LLM's strengths include its ability to understand large amounts of data, its ability to perform multiple language-related tasks, and its potential to tailor results to user needs.

Common large-scale language model applications

Due to its outstanding natural language understanding ability, LLM has considerable potential, and developers mainly focus on the following two aspects:

Provide users with accurate and up-to-date answers based on a large amount of contextual data and content
Complete specific tasks assigned by users by using different agents and tools

It is these two aspects that make the LLM application of chatting with XX explode like mushrooms after rain. For example, chat with PDFs, chat with documents, and chat with academic papers.

Subsequently, attempts were made to fuse LLM with various data sources. Developers have successfully integrated platforms such as Github, Notion, and some note-taking software with LLM.

To overcome the inherent limitations of LLM, different tools were incorporated into the system. The first such tool was a search engine, which provided LLMs with access to up-to-date knowledge. Further progress will integrate tools such as WolframAlpha, Google Suites, and Etherscan with large language models.

Architecture of LLM Apps

The diagram below outlines the flow of the LLM application when responding to user queries: First, the relevant data sources are converted into embedding vectors and stored in a vector database. The LLM adapter uses user queries and similarity searches to find relevant context from the vector database. The relevant context is put in and sent to LLM. LLM will execute these and use tools to generate answers. Sometimes LLMs are tuned on specific datasets to improve accuracy and reduce cost.

The workflow of the LLM application can be roughly divided into three main phases:

Data preparation and embedding: This phase involves keeping confidential information, such as project memos, for future access. Typically, files are segmented and processed through embedding models, kept in a special type of database called a vector database.
Formulation and Extraction: When a user submits a search request (in this case, to search for item information), the software creates a series, which is fed into the language model. The final one usually contains a prompt template hard-coded by the software developer, an example of valid output as a few-shot example, and any required data obtained from an external API and related files extracted from the vector database.
Execution and inference: Once complete, feed them to pre-existing language models for inference, which may include proprietary model APIs, open source, or individually fine-tuned models. At this stage, some developers may also incorporate operating systems such as logging, caching, and validation into the system.

Bringing LLM into the crypto space

Although there are some similar applications in the encryption field (Web3) and Web2, developing good LLM applications in the encryption field requires special care.

The crypto ecosystem is unique, with its own culture, data, and convergence. LLMs fine-tuned on these cryptographically restricted datasets can provide superior results at relatively low cost. While data is abundantly available, there is a distinct lack of open datasets on platforms like HuggingFace. Currently, there is only one dataset related to smart contracts, which contains 113,000 smart contracts.

Developers also face the challenge of integrating different tools into LLM. These tools differ from those used in Web2 by giving LLMs the ability to access transaction-related data, interact with decentralized applications (Dapps), and execute transactions. So far, we have not found any Dapp integration in Langchain.

Although additional investment may be required to develop high-quality cryptographic LLM applications, LLM is a natural fit for the cryptographic space. This domain provides rich, clean, structured data. This, combined with the fact that Solidity code is often concise, makes it easier for LLM to generate functional code.

In Part 2, we will discuss 8 potential directions where LLM can help the blockchain space, such as:

Integrate built-in AI/LLM capabilities into the blockchain
Analyze transaction records using LLM
Identify potential bots using LLM
Write code using LLM
Read code with LLM
Use LLM to help the community
Use LLM to track the market
Analyze projects using LLM

Stay tuned!

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#Gate July Transparency Report
4k Popularity
#BTC ETFs Top $153B in Holdings
6k Popularity
#Fed Ends Novel Activities Supervision
6k Popularity
#Bit Digital’s Pivot Pays Off
4k Popularity
#ETH Surge Team Battle is Here
127 Popularity

sitemap