Link Prediction Tool

Link Prediction Tool

Analysis Section

Shortest Path



Link Prediction



Link Prediction Analysis Model Code Source

Table of Contents

Here is the code of this tool:

graph link prediction
gnn link prediction
dgl link prediction
pytorch geometric link prediction
seal link prediction
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Link Prediction Tool</title>
    <style>
        #output {
            margin-top: 20px;
            border: 1px solid #ccc;
            padding: 10px;
            max-height: 300px;
            overflow-y: auto;
        }
    </style>
</head>
<body>
    <h1>Link Prediction Tool</h1>
    <h2>Analysis Section</h2>
    <button onclick="analyzeNodes()">Analyze Nodes</button>
    <div id="output"></div>

    <h2>Shortest Path</h2>
    <label for="source">Source Node:</label>
    <input type="text" id="source" name="source" value="0"><br>
    <label for="target">Target Node:</label>
    <input type="text" id="target" name="target" value="22"><br>
    <button onclick="findShortestPath()">Find Shortest Path</button>
    <div id="shortestPathOutput"></div>

    <h2>Link Prediction</h2>
    <label for="node1">Node 1:</label>
    <input type="text" id="node1" name="node1" value="1"><br>
    <label for="node2">Node 2:</label>
    <input type="text" id="node2" name="node2" value="221"><br>
    <button onclick="predictLink()">Predict Link</button>
    <div id="linkPredictionOutput"></div>

    <script>
        // Function to analyze nodes
        function analyzeNodes() {
            // Simulate analysis
            var outputDiv = document.getElementById("output");
            outputDiv.innerHTML = "<p>Node analysis results:</p><ul><li>Degree: ...</li><li>Closeness: ...</li><li>Betweenness: ...</li></ul>";
        }

        // Function to find shortest path
        function findShortestPath() {
            var sourceNode = document.getElementById("source").value;
            var targetNode = document.getElementById("target").value;

            // Simulate finding shortest path
            var shortestPathOutputDiv = document.getElementById("shortestPathOutput");
            shortestPathOutputDiv.innerHTML = "<p>Shortest path from Node " + sourceNode + " to Node " + targetNode + ":</p><p>Path: [0, 1, 2, ..., " + targetNode + "]</p><p>Length: ...</p>";
        }

        // Function to predict link
        function predictLink() {
            var node1 = document.getElementById("node1").value;
            var node2 = document.getElementById("node2").value;

            // Simulate link prediction
            var linkPredictionOutputDiv = document.getElementById("linkPredictionOutput");
            linkPredictionOutputDiv.innerHTML = "<p>Link prediction result for Node " + node1 + " and Node " + node2 + ":</p><p>Probability: 0.75</p><p>Prediction: Likely to be linked</p>";
        }
    </script>

    <script>
      import pandas as pd
import numpy as np
import random
import networkx as nx
from tqdm import tqdm
import re
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
from sklearn.metrics import roc_auc_score

# load nodes details
with open("/content/drive/MyDrive/fb-pages-food.nodes") as f:
    fb_nodes = f.read().splitlines() 

# load edges (or links)
with open("/content/drive/MyDrive/fb-pages-food.edges") as f:
    fb_links = f.read().splitlines() 

len(fb_nodes), len(fb_links)

# capture nodes in 2 separate lists
node_list_1 = []
node_list_2 = []

for i in tqdm(fb_links):
  node_list_1.append(i.split(',')[0])
  node_list_2.append(i.split(',')[1])

fb_df = pd.DataFrame({'node_1': node_list_1, 'node_2': node_list_2})

# create graph
G = nx.from_pandas_edgelist(fb_df, "node_1", "node_2", create_using=nx.Graph())

# plot graph
plt.figure(figsize=(10,10))

pos = nx.random_layout(G, seed=23)
nx.draw(G, with_labels=True,  pos = pos, node_size = 350, alpha = 0.6, width = 0.7)

plt.show()

# Calculate the degree of each node
deg = nx.degree(G,nbunch=None, weight=None )
print(deg)

# Degree centrality 
from networkx.algorithms.centrality.degree_alg import degree_centrality
def degree():
  x = degree_centrality(G)
  for i, j in x.items():
    T = i,format(j*100, ".4f") 
    H = print(T)
  return H

# Test
deg = degree()
deg

# Closeness Centrality
from networkx.algorithms.centrality.closeness import closeness_centrality
def closeness_centralities(source):
  if source in G:
            x = closeness_centrality(G, u = source, distance=None, wf_improved=True)
            print('the closeness centrality of', source, 'is:' ,x*100, '%')
            # return 'the closeness centrality of', source, 'is:' ,x*100, '%'
            # return format(x*100, ".4f") 
  else:
    print('node not in the graph')

# Test 
cl = closeness_centralities('178')
cl

# Betweenness centrality: This function computes the shortest-path betweenness centrality for nodes.
from networkx.algorithms.centrality.betweenness_subset import betweenness_centrality_source
def betweenness_centrality(source):
  if source in G:
            x = betweenness_centrality_source(G, normalized=True, weight=None, sources= source)
            for k, v in x.items():
              if v != 0.0:
                print('the betweenness centrality between' , source, 'and', k,'is:', v, '%')
  else:
    print('node not in the graph')

# Test
bt = betweenness_centrality('147')

# Edge betweenness centralities: Compute betweenness centrality for all edges.
from networkx.algorithms.centrality.betweenness import edge_betweenness_centrality
def edge_betweenness_centralities():
  x = edge_betweenness_centrality(G, k=None, normalized=True, weight=None, seed=None)
  return x

  # Test
edge = edge_betweenness_centralities()
edge

from networkx.algorithms.shortest_paths.weighted import dijkstra_path
def dijkstrapath(source,target):
  print('Returns the shortest path from source to target in a weighted graph G')
  if (source and target) in G:
    x = dijkstra_path(G , source= source, target= target, weight="weight")
    print(x)
    # return x 
  else:
    print('nodes are not in the graph')
  
dij = dijkstrapath('0','54')
dij

from networkx.algorithms.shortest_paths.weighted import dijkstra_path_length
def dijkstraPathLength(source,target):
  print('Returns the length of the shortest path from source to target in a weighted graph G')
  if (source and target) in G:
    x = dijkstra_path_length(G , source= source, target= target, weight="weight")
    # print(x)
    return x 
  else:
    print('nodes are not in the graph')

dijkstra_path_length = dijkstraPathLength('0' , '22')
dijkstra_path_length

from networkx.algorithms.shortest_paths.weighted import dijkstra_predecessor_and_distance
def dijProdAndDist(source):
  print('Compute shortest path length and predecessors on shortest paths (path of just 2 nodes) in weighted graphs.')
  if (source) in G:
    x = dijkstra_predecessor_and_distance(G, source=source, cutoff= 2, weight="weight")
    return x 
  else:
    print('nodes are not in the graph')

dijo = dijProdAndDist('78')
dijo

# combine all nodes in a list
node_list = node_list_1 + node_list_2

# remove duplicate items from the list
node_list = list(dict.fromkeys(node_list))

# build adjacency matrix
adj_G = nx.to_numpy_matrix(G, nodelist = node_list)

# get unconnected node-pairs
all_unconnected_pairs = []

# traverse adjacency matrix
offset = 0
for i in tqdm(range(adj_G.shape[0])):
  for j in range(offset,adj_G.shape[1]):
    if i != j:
      if nx.shortest_path_length(G, str(i), str(j)) <=2:
        if adj_G[i,j] == 0:
          all_unconnected_pairs.append([node_list[i],node_list[j]])

  offset = offset + 1

len(all_unconnected_pairs)

node_1_unlinked = [i[0] for i in all_unconnected_pairs]
node_2_unlinked = [i[1] for i in all_unconnected_pairs]

data = pd.DataFrame({'node_1':node_1_unlinked, 
                     'node_2':node_2_unlinked})

# add target variable 'link'
data['link'] = 0

def unlinkedNodes():
  print("the unlinked nodes are : ") 
  dataUnlinked = data
  print(dataUnlinked) 

un = unlinkedNodes()

initial_node_count = len(G.nodes)

fb_df_temp = fb_df.copy()

# empty list to store removable links
omissible_links_index = []

for i in tqdm(fb_df.index.values):
  
  # remove a node pair and build a new graph
  G_temp = nx.from_pandas_edgelist(fb_df_temp.drop(index = i), "node_1", "node_2", create_using=nx.Graph())
  
  # check there is no spliting of graph and number of nodes is same
  if (nx.number_connected_components(G_temp) == 1) and (len(G_temp.nodes) == initial_node_count):
    omissible_links_index.append(i)
    fb_df_temp = fb_df_temp.drop(index = i)

len(omissible_links_index)
# We have over 1483 links that we can drop from the graph. 

# Data for Model Training
# create dataframe of removable edges
fb_df_ghost = fb_df.loc[omissible_links_index]

# add the target variable 'link' for the linked nodes
fb_df_ghost['link'] = 1

data = data.append(fb_df_ghost[['node_1', 'node_2', 'link']], ignore_index=True)

print(data)
data['link'].value_counts()

# Feature Extraction
# drop removable edges
fb_df_partial = fb_df.drop(index=fb_df_ghost.index.values)

# build graph
G_data = nx.from_pandas_edgelist(fb_df_partial, "node_1", "node_2", create_using=nx.Graph())

# plot graph
plt.figure(figsize=(10,10))

pos = nx.random_layout(G, seed=23)
nx.draw(G_data, with_labels=True,  pos = pos, node_size = 350, alpha = 0.6, width = 0.7)

plt.show()

print("Given Graph is:")
print(G_data)

fb_df2 = fb_df.astype('int')
# create graph
G = nx.from_pandas_edgelist(fb_df2, "node_1", "node_2", create_using=nx.Graph())
G_new = nx.convert_node_labels_to_integers(G)

# Make prediction using Common neighbors
u = 1
v = 22
def cmm(u,v):
  if (u in G_new) and (v in G_new):
    sorted(nx.common_neighbors(G_new, u, v))
    if sorted(nx.common_neighbors(G_new, u, v)) == []:
      print('they have no common neighbours ', sorted(nx.common_neighbors(G_new, u, v)), '=> They will not be linked')
    else:
      print('they have common neighbours ', sorted(nx.common_neighbors(G_new, u, v)), '=> They will be linked')
  else:
    print('Nodes not found')

cn =cmm(u,v)

# sorted(nx.common_neighbors(G_new, 21, 22))

# Make prediction using Jaccard Coefficient
def JC(u,v):
  if (u in G) and (v in G):
    preds = nx.jaccard_coefficient(G, [(u, v)])
    for u, v, p in preds:
        print(f"({u}, {v}) ->  {p*100:.8f}")
        # print(p*100)
        print("they are ", p*100 ,"similar")
        if p*100 >=50:
          print('They will be linked')
        else:
          print('They will not be linked')

u = 1
v = 221
test = JC(u,v)

preds = nx.adamic_adar_index(G, [(0, 1), (2, 3)])
for u, v, p in preds:
  print('(%d, %d) -> %.8f' % (u, v, p))




# Prediction using Adamic Adar 
pred_adamic = list(nx.adamic_adar_index(G_data))
score_adamic, label_adamic = zip(*[(s, (u,v) in G_data) for (u,v,s) in pred_adamic])

# Compute the ROC AUC Score
fpr_adamic, tpr_adamic, _ = roc_curve(label_adamic, score_adamic)
auc_adamic = roc_auc_score(label_adamic, score_adamic)

    </script>
</body>
</html>

Explain the Code

gnn link prediction
dgl link prediction
the link prediction problem for social networks
neo4j link prediction
link prediction problem

This code is an HTML document containing JavaScript code for a Link Prediction Tool along with Python code embedded within the JavaScript block. Let’s break it down:

HTML Structure:

  • The HTML structure defines the layout of the web page. It includes:
    • Various sections with headings such as “Link Prediction Tool”, “Analysis Section”, “Shortest Path”, and “Link Prediction”.
    • Input elements like buttons and text fields to interact with the tool.
    • Output div elements where the results of analysis, shortest path, and link prediction will be displayed.
link prediction
prediction graph
link prediction graph neural networks
graph link
link prediction based on graph neural networks
graph link prediction

Output:

  • The output of the Python code includes printed results and visualizations like graphs, and it’s expected to be displayed within the web page.

Interaction:

  • Users can interact with the web page by clicking buttons to trigger JavaScript functions, which in turn simulate various network analysis and prediction tasks.

CSS Styling:

    • The <style> tag contains CSS styles to format the appearance of the HTML elements. For example, it sets the style for the output div (#output), giving it a border, padding, and scrollable overflow.

    JavaScript Functions:

    • JavaScript functions are defined to perform different tasks:
      • analyzeNodes(): Simulates node analysis and updates the output div with the analysis results.
      • findShortestPath(): Simulates finding the shortest path between two nodes and updates the output div with the path information.
      • predictLink(): Simulates predicting a link between two nodes and updates the output div with the prediction result.
    link prediction
    prediction graph
    link prediction graph neural networks
    graph link
    link prediction based on graph neural networks

    Embedded Python Code:

      • Python code is embedded within the <script> tag, which is unusual in a typical web development setup but seems to be for demonstration purposes here.
      • It imports various libraries such as Pandas, NumPy, NetworkX, Matplotlib, and scikit-learn.
      • It reads data files containing node and edge information (fb-pages-food.nodes and fb-pages-food.edges).
      • It performs various network analysis tasks using NetworkX, such as calculating centrality measures, finding the shortest path, and predicting links between nodes.
      • It also demonstrates the use of machine learning models for link prediction, such as Jaccard Coefficient and Adamic Adar Index.
      link prediction problem
      link prediction networkx

      The HTML document combines JavaScript for user engagement and Python for backend network analysis, showcasing a blend of frontend and backend features within one document.

      Link prediction is like having a crystal ball for network connections. Let’s break it down step by step:

      • Gather Your Data: Start by collecting all the puzzle pieces that represent your network. It’s like gathering clues to solve a mystery.
      • Split Your Data: Divide your data into two parts: a training set for learning and a test set for evaluation. It’s like having two sets of clues to crack the case.
      • Extract Features: Dive into the characteristics of the nodes in your network. It’s like studying the personalities and backgrounds of people to predict potential friendships.
      • Choose a Model: Select the perfect method to make predictions. It’s like picking the right tool for the job, whether it’s a ruler for measuring or a compass for finding direction.
      • Train Your Model: Teach your chosen model using the training data. Think of it as training a detective to spot patterns and make educated guesses.
      • Evaluate Performance: Test your model’s accuracy using the test data. It’s like checking if the detective’s predictions match reality.
      • Fine-Tune and Validate: Adjust your model and test it again to ensure it works well in different scenarios. This is like sharpening the detective’s skills through practice and feedback.
      • Make Predictions: Once your model is reliable, use it to predict future connections in your network. It’s similar to using your detective’s insights to anticipate potential friendships or collaborations.
      • Refine and Repeat: Continuously improve your approach based on new data and insights. Just like detectives refine their methods over time, your link prediction techniques can become more accurate with each iteration.

      In essence, link prediction involves understanding the dynamics of connections in a network and using that knowledge to anticipate future relationships.

      Link prediction in Neo4j is an intriguing process that involves discovering potential connections within a graph database. Instead of relying on strict numerical methods, it revolves around embracing the inherent structure of Neo4j’s graph model.

      The main goal of link prediction is to shed light on missing or likely relationships between nodes, providing insights into the underlying dynamics of the network.

      This requires delving into the properties of nodes and relationships, understanding their subtleties, and identifying patterns that suggest future connections.

      To achieve this, practitioners in Neo4j often utilize the expressive power of Cypher queries. These queries act as a pathway to navigate through the graph, extracting relevant features and valuable insights along the journey.

      By crafting precise Cypher queries, analysts can explore the nodes and relationships, uncovering crucial information that informs the prediction process.

      Whether it involves examining node properties, evaluating relationship strengths, or identifying structural motifs, Cypher queries offer the means to extract actionable insights from the graph data.

      Moreover, Neo4j provides a diverse range of graph algorithms that can be leveraged for link prediction tasks.

      From traditional algorithms like Common Neighbors to more advanced approaches such as PageRank and Community Detection, these algorithms serve as powerful tools for analyzing graph structures and making informed predictions.

      Common Neighbors: This algorithm predicts a connection between two nodes if they have a significant number of shared neighbors. The idea behind this is that nodes with many mutual connections are more likely to form new connections in the future.

      Jaccard Coefficient: Similar to the Common Neighbors algorithm, the Jaccard coefficient measures the similarity between two nodes based on the proportion of common neighbors relative to the total number of neighbors. It predicts a link between nodes that have a high Jaccard similarity.

      Adamic/Adar: This algorithm gives more importance to common neighbors that are less common in the network. It assumes that connections through rare neighbors are more informative and therefore more indicative of potential future links.

      Preferential Attachment: This algorithm predicts links based on the principle that nodes with higher degrees (more connections) are more likely to attract new links. It assumes that popular nodes will continue to gain connections over time.

      Community Detection: Community detection algorithms identify groups of nodes that are densely connected within themselves but have sparse connections to nodes outside the community. Predicting links within these communities can be more accurate due to the cohesive nature of the groups.

      Node Embedding Techniques: Node embedding algorithms like DeepWalk, Node2Vec, and GraphSAGE learn low-dimensional vector representations of nodes based on their network neighborhood. These embeddings capture latent features of nodes and can be used to predict links based on similarity in the embedding space.

      Machine Learning Approaches: Machine learning models, such as logistic regression, decision trees, random forests, support vector machines, and neural networks, can be trained on various features extracted from the graph to predict links. These models can incorporate both structural features and domain-specific features to improve prediction accuracy.

      Hey there, take a look at this! When you want to assess the effectiveness of a link prediction method, you need to do some evaluation, you know? It’s like checking if your friend’s jump shot is on point or if it’s just air balls all day. So, here’s how we do it:

      Precision: This tells you how accurate the predictions are. It’s like, out of all the shots your friend took, how many actually went in the hoop, you feel me? You want a high precision rate because that means most of the predicted links are the real deal.

      Recall: Now, this one’s about completeness. It’s like checking if your friend made all the shots he could’ve made, even if some didn’t go in. You want a high recall rate so you know your predictions aren’t missing out on too many links in the network.

      F1-score: This one’s like the MVP of evaluation metrics. It takes both precision and recall into account, giving you a solid overall rating. It’s like combining your friend’s shooting percentage with how many shots he takes in a game to see how much of a baller he really is.

      AUC-ROC: This metric is all about how well your model can distinguish between a slam dunk and a brick. It’s like seeing if your friend can spot an open shot from a mile away. A high AUC-ROC means your model’s got some serious game.

      AUC-PR: Now, this one’s for when the game gets really imbalanced, you know? It’s like when one team has all the stars and the other is struggling to keep up. A high AUC-PR shows your model can handle those lopsided matchups like a champ.

      Accuracy: This is the basic stat, like how many baskets your friend made out of all the shots he took. But sometimes, accuracy can be deceiving if the game isn’t played on a level court, you get me? So, it’s good to check out the other metrics too to get the full picture.

      So, when you’re evaluating a link prediction method, keep these stats in mind and see if your model has what it takes to ball out in the big leagues.