Skip to main content

Intro to Graph Neural Networks pack

This guide summarizes the concepts covered in the Graph Neural Network project pack (get this pack here https://cellstrathub.com/packs).

Graph Neural Networks are specialized neural networks that operate on data structured as graphs.

Use this pack to jumpstart your Graph project development for applications such as social networks, chemical compounds, maps, transportation systems, recommender systems and many more.

graphs

Image credit: http://www.cs.toronto.edu/~yujiali/

This pack includes the following projects related to Graphs and Graph Neural Networks :-

Basics of Graph Theory#

Graph is a combination of vertices (nodes) and lines (edges). Vertex is a point where the lines meet and Edge is the line connecting two vertices.

graph

Image credit: https://en.wikipedia.org/

Below is an example of visual representation of graphs. This is the character network for Harry Potter and the Goblet of Fire.

harry_potter_graph

Image credit: https://anthonybonato.com/2016/08/03/social-networks-in-novels-and-films/

This project in the Graph pack illustrates the shortest path distance graph between US airports from an airline flights dataset.

Knowledge Graph#

Data can be represented as graphs to make it more interpretable. Below figure is an example of knowledge graph which shows several entities like Putin, Russia, KGB, APEC and how they are linked to one another.

putin

Image credit: https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/

Question and Answering, one of the popular NLP tasks used in dialogue interface, chatbots and other information retrieval systems can be created with the help of knowledge graphs.

E.g. KG allows to go beyond just the keyword matching and returns more relevant results like the google search result shown here :

YODA

Image credit: https://ahrefs.com/blog/google-knowledge-graph/

Intro to Graph and Graph Neural Networks#

Images and text are structured data.

Convolutional Networks, Recurrent Networks, Autoencoders etc. work well on the structured data as they can be converted to the matrix or the vector like format

But Graphs are unstructured data.

unstructured

Image credit https://www.curvearro.com/blog/difference-between-structured-data-unstructured-data/

GNNs are able to model the relationship between the nodes in a graph and produce a numeric representation of it. Social networks, chemical compounds, maps, transportation systems are some of the applications where graph neural networks are used.

A graph is transformed to an embedding representation, for further information processing. Embeddings may be created with help of a neural network transformation.

embedding2

Image credit https://medium.com/dair-ai/an-illustrated-guide-to-graph-neural-networks-d5564a551783

Graph frameworks like DGL (Digital Graph Library) provide in-built functions that make training the graphs easier.

Graph Convolution Network#

Just like normal Convolutional Neural Networks, Graph Convolution Networks (GCN) aid in detecting local patterns.

embedding

Image credit: https://www.semanticscholar.org/paper/Towards-Machine-Learning-Enabled-Automatic-Design-W%C3%85HLIN/6af7126fea32189b8754ef2f1e1b8477729cc19e

Use cases of GCN, include social networks or citation networks, ID Card digitalization etc.

id_card

ID Card digitilization (Image credit: Nanonets-ID Card Digitization and Information Extraction)

For GCN, a graph convolution operation produces the normalized sum of the node features of neighbors.

Different network configurations in GCN include Spectral and Spatial Convolutions (spacial GCN are also called message passing neural networks as they rely on aggregating feature information from neighbors).

Graph Attention Network#

Graph Attention Network (GAT) is a variant of Graph Convolutional Network that uses the attention mechanism for feature dependent and structure free normalization. This helps in representing the graphs better.

gat

Image credit: https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html

In GAT, Additive Attention with Softmax Normalization is used for finding the attention weights (unlike Dot Product Attention of Transformers).

Classification of CORA dataset (citation network) can be done with the help of Graph Attention Network built with the framework of DGL.

cora

Image credit: Cora dataset

Relational GCN#

Knowledge graphs have triplets in the form of subject, relation and the object. Thus, we need to consider the edges for the relations.

Example:

obama

Image credit: https://docs.dgl.ai/en/0.4.x/tutorials/basics/5_hetero.html

Barack Obama - Subject; Occupation - Relation; Politician - Object

In a movie recommender system, User-Movie interactions can be marked with edges with user ratings.

movie

Image credit: https://docs.dgl.ai/en/0.4.x/tutorials/basics/5_hetero.html

Relational learning tasks include Entity Classification and Link Predictions.

link_prediction

Image credit: https://www.analyticsvidhya.com/blog/2020/01/link-prediction-how-to-predict-your-future-connections-on-facebook/

Generative Models for Graph#

Graph completion by the generative models helps in finding new links useful in knowledge graph completion, social networks, map development, drug and material discovery etc.

generative

Image credit: https://docs.dgl.ai/tutorials/models/3_generative_model/5_dgmg.html

Deep Generative Models of Graphs (DGMG) helps in structural generation that uses probability driven structure to sequentially add the nodes, edges and finally connect with the destination nodes.

dgmg

Image credit: https://docs.dgl.ai/tutorials/models/3_generative_model/5_dgmg.html

Molecular structures like cyclic molecules can be generated and validated with DGMG :

molecules

Image credit: https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system

Giant Graphs#

Giant graphs have usually millions or billions of nodes and edges as seen in real life examples. 

E.g. Reddit dataset constructed by Hamilton et al., wherein the nodes are posts and edges are established if two nodes are commented by a same user. This graph has 233,000 nodes, 114.6 million edges and 41 categories.

reddit

Image credit: https://minimaxir.com/2016/05/reddit-graph/

Social Networks like Facebook, Twitter and LinkedIn has user graphs ranging from 10 million to billion nodes for the representation of the users.

social_network

Image credit: http://massprivatei.blogspot.com/2013/03/police-are-using-nucleik-ora-social.html

Due to the storage and computation needed for training these graphs in neural networks, we need sampling techniques. There are various ways in which sampling can be done from giant graphs. Two of the famous strategies are Neighbor Sampling and Control Variate Sampling.

GraphSage is a variant of Graph Convolutional Network used for finding inductive node embeddings. This particular technique can be used in finding the node classification for the Pubmed dataset by Neighbor Sampling.

graphsage

Image credit: https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

Recommendation Using Graph#

Two of the key recommendation models are Content based recommendations and Collaborative filtering.

Collaborative filtering models solve the matrix completion task by taking into account the collective interaction data to predict future ratings or purchases.

collaborative_filtering

Image Credit: https://towardsdatascience.com/building-a-music-recommendation-engine-with-probabilistic-matrix-factorization-in-pytorch-7d2934067d4a?gi=f359e3b8f1a8

Matrix completion can be considered as link predictions on graphs.

Graph Convolution Matrix Completion (GC-MC) is a graph based autoencoder framework for the matrix completion based on the deep learning for graphs.

Encoder produces the latent features of user and item nodes through message passing on bipartite interaction graph. Decoder is used to reconstruct the rating links from the latent features.

recommendations

Image credit: https://arxiv.org/abs/1706.02263

Graph analysis with Neo4j#

A relational database (e.g. Oracle or Amazon Aurora) is a collection of data items with pre-defined relationships between them. These items are organized as a set of tables with columns and rows. It can be queried by SQL.

Graph database is simply composed of dots and lines. Relational databases can easily handle direct relationships, but indirect relationships are more difficult to deal with in relational databases. Graph databases helps in storing these relations.

A graph database transcends storing data points, rather, it stores data relationships.

graph_db

Image credit: https://www.sqlshack.com/understanding-graph-databases-in-sql-server/

Some of the most popular implementations of graph databases are :

  • Neo4j
  • Amazon Neptune
  • TigerGraph

neo4j

Image credit: https://www.bmc.com/blogs/neo4j-graph-database/

Neo4j is an Open-source graph database. It supports a wide range of programming languages including Python and OS including Windows, Linux. Cypher is its query language and helps to connect with Spark.

Neo4j finds use in fraud detection, real-time recommendations etc.

Graph Analysis can be done in Neo4j Sandbox, cloud based instance of Neo4j server by connecting it with CellStrat Hub notebooks.

neo4j_nodes

Image credit: https://neo4j.com/developer/cypher/intro-cypher/

A query with Cypher might look like :

MATCH (:Person {name: 'Jennifer'})-[:WORKS_FOR]->(company:Company) RETURN company

Recommendations are done with querying the graph in Neo4j. Techniques used for the recommendations are - Collaborative Filtering, PageRank, Personalized PageRank and Topic Sensitive Search.

Get the Graph Neural Networks pack here https://cellstrathub.com/packs