Machine Learning and Blockchain – Part 2/3 – Use Cases for Machine Learning variants

Published by Martin Schuster on

Last Updated on 12. August 2024 by Martin Schuster

In our second part of our blog series about machine learning and blockchain, we are going to give you some examples where the machine learning variants from our first part are used on the Blockchain and off chain.

Use Cases for Machine Learning Variants

In this section, we will give some Methods for the in part 1 mentioned, Machine learning variants and outline some use cases where they are used with the blockchain or without.

Unsupervised Learning

Use Case without Blockchain

Anomaly detection is a common use case for unsupervised learning, it involves identifying data points that deviate significantly from the expected behaviour of the data. Here’s how it works:

  1. First, the unsupervised learning model is trained on a dataset that only contains “normal” or “typical” behaviour. This training process involves clustering similar data points together and identifying the patterns or structure within the data.
  2. Once the model is trained, it can be used to detect anomalies or outliers in new data. When presented with a new data point, the model will compare it to the patterns it learned during training. If the new data point is significantly different from the expected behaviour, the model will flag it as an anomaly.

Anomaly detection can be used in a variety of domains, such as fraud detection in finance, fault detection in manufacturing, or intrusion detection in cybersecurity. By using unsupervised learning to identify anomalies, organizations can quickly and accurately detect and respond to unusual behaviour, which can help prevent financial losses, reduce downtime, or improve security.

Use Case with Blockchain

Unsupervised learning techniques like clustering can be used to group blockchain transactions, addresses, or nodes into clusters based on their similarity. This is how it works:

  1. The first step is to prepare the blockchain data (for example block information or transaction data) for clustering. This involves collecting and preprocessing the data, which may include extracting features, cleaning the data, and normalizing the data.
  2. Once the data is prepared, the next step is to select a clustering algorithm. Popular unsupervised clustering algorithms include k-means clustering, hierarchical clustering, and DBSCAN.
  3. With the clustering algorithm selected, the data can be clustered into groups based on their similarity. For example, transactions that have similar attributes, such as amount and destination, can be grouped together into a cluster.
  4. Once the data is clustered, it can be analysed to detect patterns and anomalies. For example, clusters that contain a high percentage of fraudulent transactions may be flagged for further investigation.

Using unsupervised learning techniques like clustering with blockchain data can help identify patterns and anomalies that may be difficult to detect using traditional methods. This can be particularly useful in detecting fraudulent activity or identifying clusters of transactions that may be related to criminal or malicious activity.

Supervised Learning

Use case without Blockchain

Supervised learning can be a powerful technique for fraud detection, particularly in cases where labelled data is available. Here are some more details on how it can be used:

  1. The first step in using supervised learning for fraud detection is to collect a dataset of labelled examples of both legitimate and fraudulent transactions or activities. This dataset should be large enough to capture the full range of behaviours that might be encountered in real-world use cases.
  2. Once a dataset has been collected, the next step is to extract meaningful features from the data that can be used to train a model. This might involve extracting information about the transaction or activity, such as the amount of money involved, the parties involved, and the time of day the transaction occurred.
  3. Once features have been extracted, the next step is to select an appropriate machine learning model and train it on the labelled data. Popular models for fraud detection include logistic regression, decision trees, and support vector machines.
  4. Once a model has been trained, it is important to evaluate its performance on a separate, labelled test dataset to ensure that it is accurately detecting fraudulent behaviour while minimizing false positives. This evaluation might involve measuring metrics such as precision, recall, and F1 score.
  5. Finally, once a model has been trained and evaluated, it can be deployed in a real-world environment to detect fraudulent behaviour in real time. This might involve integrating the model with an existing fraud detection system or building a new system from scratch.

Overall, supervised learning can be a powerful technique for fraud detection in cases where labelled data is available. By collecting a dataset of both legitimate and fraudulent behaviour, extracting meaningful features, and training an appropriate machine learning model, it may be possible to accurately detect and prevent fraudulent activity in real time.

Use case with Blockchain

Supervised learning algorithms can be used to classify blockchain transactions by type, which can be useful for various purposes such as fraud detection, compliance, or risk management. Here are some details on how this can be done:

  1. The first step is to prepare the data for training the supervised learning model. This involves collecting a labelled dataset of blockchain transactions, where each transaction is labelled with its corresponding type, such as payment, smart contract transaction, or token transfer. The labelled dataset can be created manually by experts or by using automated tools that can classify transactions based on their attributes.
  2. The next step is to extract relevant features from the transactions that can be used as input to the supervised learning model. This can include features such as the sender and receiver addresses, the amount and currency of the transaction, the transaction fee, the timestamp, and any metadata associated with the transaction.
  3. Once the data is prepared and the features are extracted, the next step is to select a suitable supervised learning algorithm and train the model on the labelled dataset. Popular algorithms for transaction classification include support vector machines (SVM), Naive Bayes, and k-nearest neighbours (KNN). The choice of algorithm will depend on the specific problem and dataset at hand.
  4. After the model is trained, it is important to evaluate its performance on a separate test dataset to ensure that it can generalize well to new transactions. The model can then be deployed in production to classify new transactions in real-time.

Supervised learning can be a powerful tool for classifying blockchain transactions by type, and can provide valuable insights into the behaviour of different types of transactions on the blockchain.

Reinforced Learning

Use case without Blockchain

Reinforcement learning is particularly useful in robotics because it allows the robot to learn from its experiences in the environment, rather than relying on pre-programmed behaviours. Here are some more details on how reinforcement learning can be used in robotics:

  1. The first step in applying reinforcement learning to robotics is to define the task that the robot needs to perform, such as grasping an object or navigating to a specific location. This task is then formulated as a reward function, which assigns a numerical reward to the robot based on how well it performs the task.
  2. The next step is to define the state representation of the environment, which includes the robot’s sensors and the state of the objects in the environment. This state representation is used to map the robot’s actions to the rewards it receives.
  3. The robot then selects actions based on its current state and the expected rewards for each action. These actions are typically represented as motor commands that control the robot’s actuators.
  4. The robot’s performance is evaluated based on the reward function, and the reinforcement learning algorithm is used to update the robot’s policy (i.e., the mapping from states to actions) to maximize the expected reward. This involves iteratively exploring the environment, selecting actions based on the current policy, and updating the policy based on the observed rewards.
  5. Once the policy has been optimized, the robot can be deployed in the real-world environment to perform the task.

Overall, reinforcement learning has been used to achieve impressive results in robotics, including grasping objects, navigating through complex environments, and even playing games like chess and Go. However, it can also be challenging to apply reinforcement learning to robotics, as it requires a high degree of precision and reliability, and the robot must be able to operate in a safe and controlled manner.

Use case with Blockchain

Reinforcement learning can be used to train an agent to optimize the order in which transactions are added to a block in a blockchain network to maximize the fees earned by the miner. Here are some more details on how this can be done: 

  1. The first step is to define the problem of optimal transaction order as a reinforcement learning problem. In this case, the goal of the agent is to maximize the total transaction fees earned by the miner by selecting an optimal order for the pending transactions to be added to the next block.
  2. The next step is to define the state and action space for the reinforcement learning agent. The state space can include information such as the current pending transactions, their fees, and the available space in the block. The action space can include all possible orderings of the pending transactions.
  3. The reward function is a crucial component of a reinforcement learning issue, as it defines the goal of the agent. In this case, the reward function can be defined as the total transaction fees earned by the miner for adding the transactions to the block in the selected order.
  4. The reinforcement learning agent can be trained using techniques such as Q-learning or policy gradient methods. During training, the agent learns to select the optimal order of transactions that maximizes the total transaction fees earned by the miner.
  5. Once the agent is trained, it can be deployed in the blockchain network to optimize the order in which transactions are added to a block. This can be done by having the agent analyse the pending transactions, select the optimal order, and submit the block to the network.

Optimizing the transaction order in this way can increase the profitability of miners and reduce transaction confirmation times for users. However, it is important to note that any changes to the order of transactions must still adhere to the rules and consensus mechanism of the blockchain network.

Deep Learning

Use case without Blockchain

In the language domain, deep learning models such as recurrent neural networks (RNNs), transformers, and their variants have been used for a variety of tasks such as language modelling, machine translation, text generation, sentiment analysis, and question answering. For example, RNNs can be used for language modelling, which involves predicting the next word in a sentence or a sequence of words. This can be used for text completion or suggestion in various applications such as search engines, virtual assistants, or chatbots. Transformers, on the other hand, have been used for machine translation, which involves translating text from one language to another. Deep learning models can also be used for text generation, which involves generating new text based on a given input or context. This can be used in various applications such as content generation, chatbots, or virtual assistants. Sentiment analysis involves analysing the sentiment or emotion expressed in a piece of text, such as a tweet or a product review. Deep learning models such as RNNs or convolutional neural networks (CNNs) can be used for sentiment analysis, which can be used in various applications such as social media monitoring, market research, or customer feedback analysis. Question answering involves answering questions based on a given context or knowledge base. Deep learning models such as transformers can be used for question answering, which can be used in various applications such as virtual assistants, chatbots, or customer support systems.

Use case with Blockchain

Deep learning algorithms can be trained to detect fraudulent activity on the blockchain by analysing transactional patterns and identifying anomalous behaviour. Here are some more details on how this can be done:

  1. The first step in using deep learning for fraud detection on the blockchain is to gather and prepare the data. This can include transactional data, as well as any additional data sources that may be relevant, such as network activity logs or user behaviour patterns.
  2. Once the data has been collected, the next step is to engineer features that can be used as inputs for the deep learning model. This can include features such as transaction amounts, time stamps, sender and receiver addresses, and other metadata associated with the transactions.
  3. There are a variety of deep learning models that can be used for fraud detection, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs). The choice of model will depend on the specific problem being addressed and the nature of the data.
  4. Once the model has been selected, it can be trained on the prepared data. During training, the model learns to identify patterns and anomalies associated with fraudulent activity.
  5. Once the model has been trained, it can be evaluated on a separate set of test data to assess its performance. This can include metrics such as accuracy, precision, recall, and F1-score.
  6. Once the model has been trained and evaluated, it can be deployed in the blockchain network to monitor for fraudulent activity in real time.

By using deep learning to detect fraudulent activity on the blockchain, it may be possible to identify and prevent criminal behaviour more quickly and effectively than traditional methods. However, it is important to note that the accuracy of the model will depend on the quality of the data and the ability to identify and address any potential biases in the training process.

Transfer Learning

Use case without Blockchain

Transfer learning can be particularly useful in the healthcare industry, where medical data is often limited and expensive to collect. A pre-trained deep learning model that has already learned to extract relevant features from medical images, such as mammograms or MRIs, can be adapted to a new task or disease with a smaller amount of patient data. For example, a pre-trained model for breast cancer detection in mammograms can be fine-tuned with a smaller dataset of patients with a specific type of breast cancer, such as triple-negative breast cancer. This allows healthcare professionals to diagnose and treat patients with specific diseases or conditions, even with limited data available, more accurately. Transfer learning can also be used for natural language processing tasks in healthcare, such as predicting patient outcomes or identifying adverse drug reactions.

Use case with Blockchain

Transfer learning can be a powerful technique for analysing the structure and behaviour of blockchain networks. Here are some more details on how it can be used for network analysis:

  1. The first step in transfer learning is to identify pre-trained models that can be leveraged for the specific task at hand. For example, a pre-trained model for social network analysis might have been trained on Twitter data to identify communities or influencers.
  2. Once a pre-trained model has been identified, it can be adapted to the blockchain network by fine-tuning the model using blockchain-specific data. For example, if the pre-trained model is a neural network, the weights of the network can be updated using blockchain data to make it more effective at identifying communities or influencers within the blockchain network.
  3. Once the adapted model has been trained, it can be evaluated using standard evaluation metrics to assess its performance on the specific task.
  4. Once the adapted model has been evaluated, it can be used to analyse the structure and behaviour of the blockchain network. For example, the model might be used to identify clusters of nodes that are closely connected, or to identify influential nodes that are particularly important in the network.

Transfer learning can be a powerful technique for leveraging pre-existing knowledge from other domains to better understand the structure and behaviour of blockchain networks. By using pre-trained models and fine-tuning them on blockchain-specific data, it may be possible to gain new insights into the operation of these networks and identify potential areas for improvement or further analysis.

Practical Example Community Supported Insurance

To give you a practical example of how Machine Learning and Blockchain are used today, we are going to introduce the Community Supported Insurance (CSI) Project. CSI is also a project of the Blockchain Competence Center Mittweida (BCCM). CSI wants to provide a way to compensate if your travel by train goes wrong because of a delay. The basic concept is the following:

After connecting your crypto wallet to the app, you simply specify your connection and choose from various offers that protect you from delays. In case of a delay, you will be compensated automatically and, more importantly, within seconds.

This whole process is an interaction between a Machine Learning Model that is hosted on a Server and accessible via an API and a Smart Contract that is deployed on a Blockchain.

To train the model, a Dataset was generated with over 100,000 train connections from 2022 with real delays, enriched with weather, holidays, historical delays, transfers, transfer time and even more (approx. 35 features). These labeled data were then trained and compared with different supervised learning methods like Decision Tree, CatBoost, Random Forest, SVM, Bernoulli Naive Bayes and others. The goal was to forecast the probability of a train arriving at its destination more than 60 minutes late.

To compare these methods, a so-called baseline model was developed which only evaluates the historical delay of a connection (i.e., simply looks at how punctual the connection was the last few times). The next step was to train the models with the different methods and compare them to the baseline model regarding AUC (Area Under the Curve) and the F1 score. F1 score and AUC are commonly used metrics in machine learning to evaluate the performance of a classification model. In this comparison, the method CatBoost was that that produced the most accurate results. This model was then used to develop an API service which aggregated live all necessary features for one connection and returns the calculated probability of a train delay. This probability value is used to determine the premium rates for the protection for the desired Train connection.

If the user then buys the protection, it is recorded in a Smart Contract and if the Train delay exceeds 60 minutes, he will be compensated automatically and instantly out of the risk pool of the Smart Contract.

If you are eager to know more about the project, feel free to visit the webpage of that project under: https://csi.hs-mittweida.de/

Risks and benefits of Machine Learning

As we can see, Machine Learning can be used for a wide variety of use cases with or without the blockchain. Below are some general risks and potential benefits listed that come with Machine Learning.

Risks

  1. Bias and discrimination: Machine Learning algorithms can learn biases from the data they are trained on and perpetuate them, leading to discriminatory outcomes.
  2. Overreliance on automation: There is a risk that humans may become too reliant on Machine Learning algorithms and blindly follow their recommendations without critically evaluating them.
  3. Security and privacy: Machine Learning algorithms require large amounts of data to learn from, which can make them a target for hackers seeking to steal or manipulate that data.
  4. Unforeseen consequences: Machine Learning algorithms can produce unexpected and unintended consequences that were not anticipated by their designers.
  5. Job displacement: As Machine Learning algorithms become more advanced, there is a risk that they will replace human workers in certain industries, leading to job displacement.

Potential Benefits

  1. Improved efficiency: Machine Learning algorithms can analyse data faster and more accurately than humans, leading to increased efficiency in many industries.
  2. Personalization: Machine Learning algorithms can be used to personalize experiences for individual users, such as recommending products or services based on their preferences.
  3. Better decision-making: Machine Learning algorithms can help humans make better decisions by providing insights and predictions based on large amounts of data.
  4. Medical applications: Machine Learning algorithms can be used to improve medical diagnoses and treatments, leading to better patient outcomes.
  5. Scientific discoveries: Machine Learning algorithms can be used to analyse large amounts of data in fields such as astronomy and genetics, leading to new scientific discoveries.

This concludes our second part of the blog series about Machine Learning and Blockchain. By now, you should have an understanding of how Machine Learning works and how it is uses with or without the Blockchain. In the next part we are going to demonstrate some scenarios where our five variants could be used soon to create new business models in incorporating the blockchain.

Sources:

https://data-flair.training/blogs/advantages-and-disadvantages-of-machine-learning/

https://www.ibm.com/topics/supervised-learning

https://www.guru99.com/unsupervised-machine-learning.html

https://www.guru99.com/reinforcement-learning-tutorial.html

https://link.springer.com/article/10.1007/s42979-021-00815-1

https://www.v7labs.com/blog/transfer-learning-guide