Perface

This is the notes when I was reading A Three-Way Model for Collective Learning on Multi-Relational Data paper.

Introduction Notes

In this paper, the model learns relations mainly based on tensor factorization related to DEDICOM model.

Modelling and Notation Notes

The authors model the triples into a three-way tensor $\chi$, where two modes are concantenated entities and the third are relations.

$\chi_{ijk} = 1$ indicates existing the relation $\text{(i-th entity, k-th predicate, j-th entity).}$
$\chi_{ijk} = 0$ indicates non-existing and unknown relations.

$\text{Some Notations}$

$\chi_k$ is the k-th frontal slice of the tensor $\chi$.
$X(n)$ denotes the unfolding of the tensor $chi$ in mode $n$
$A \otimes B$ refers to the Kronecker product of the matrices A and B
$vec(X)$ is the vectorization of the matrix $X$
Data is given as a $n \times n \times m$ tensor $\chi$, where $n$ is the number of entities and $m$ the number of relations.
Methods and Theoretical Aspects Notes
Except the labelled relations between entities, the inner correlations will also exist. For instance, if we want to know which party the American president is belonged to without additional information, we could check the relation between vice president and party. In most of the time, the president and vice president come from the same party. This kind of information could be learned by $\text{Collective learning}$.

$\text{Collective learning}$: It considers information provided by entities not the particular learning task. Thus, it can learn attributes, classes or relations of connected entities and applies to vast tasks like classification, entity resolution, link prediction and so on.

A Model for Multi-Relational Data

RESCAL model uses rank-r factorization, where each slice $\chi_k$ is factorized as:

$\chi_k \approx A R_k A^T, \text{for } k = 1, ..., m \tag 1$

$A$ is a $n \times r$ matrix which includes the latent-component
representation of the entities
$R_k$ is an asymmetric $r \times r$ matrix which models the innerconnections of latent components in the k-th predicate.

Solving the regularized minimization problem could get $A$ and $R_k$

$\text{min}_{A, R_k} f(A, R_k) + g(A, R_k) \tag 2$ $f(A, R_k) = \frac{1}{2} (\sum_k||\chi_k - A R_k A^T||^2_F) \tag 3$ $g(A, R_k) = \frac{1}{2} \lambda (||A||^2_F + \sum_k||R_k||^2_F) \tag 4$

$g(A, R_k)$ is the regularization term
Formula 3 could be described detailly by formula 5. Here $a_i$ indicates the i-th row of $A$ and also represent the embedding of i-th entity, so is $a_j$ $f(A, R_k) = \frac{1}{2} (\sum_{i, j, k}||\chi_{ijk} - a_i^T R_k a_j||^2_F) \tag 5$
The asymmetry of $R_k$ takes into account whether a latent component occurs as a subject or an object.

Computing the Factorization Notes

Use alternating least-square(ALS) and ASALSAN algorithm ¹

Update A

The data is stacked side by side as formula 6 $\overline X = A \overline R (I_{2m} \otimes A^T) \tag 6$ $\overline X = (X_1 X_1^T ... X_m X_m^T)$ $\overline R = (R_1 R_1^T ... R_m R_m^T)$

Reference

[1] Temporal analysis of semantic graphs using ASALSAN

[2] Accurate Unlexicalized Parsing

[3] A Three-Way Model for Collective Learning on Multi-Relational Data