Perface

This is the notes when I was reading Representing Text for Joint Embedding of Text and Knowledge Bases paper.

Introduction Notes

This paper was based on jointly learning continuous representations for knowledge base and textual relations by Riedel et al. [1]. The textual relations of two entities were extracted by the lexicalized dependency path. The previous work made textual relation learn its continuous latent representation from the pattern of its co-occurrences in the knowledge graph.

The authors think that there are large common sub-structure (similar words and dependency paths) for synonymous textual relations. Therefore, they raised sharing parameters among related dependency paths. Their contribution is using a CNN to derive continuous representations for textual relations and this boosts the performance of link prediction for entity pairs have textual information.

The author introduce three types of related work for link prediction, using KB, using text and using both of them. Their work is built on the approach of Riedel et al ¹ and the extension of the DISTMULT model ² (a simple variant of a bilinear model) and jointly train the representations of KB and text relations.

Models for knowledge base completion Notes

$\textbf{Terminologies}$:

RDF triples: $(e_s, r, e_o)$, where $e_s$ is subject entity, $e_o$ is object entity and $r$ is relation.
Task: Predict relations between entities. Rank candidate entities for given queries like $(e_s, r, ?)$, $(?, r, e_o)$

They followed previous work ² to represent both textual and knowledge base relations.
$\textbf{Textual relations}$: Use lexicalized dependency paths, like for entities $\text{BARACK OBAMA}$ and $\text{UNITED STATES}$, the textual relation is

$SUBJECT \overset{nsubj}{\longleftarrow} president \overset{prep}{\longrightarrow} of \overset{obj}{\longrightarrow} OBJECT$

The model uses a score between $(0, 1)$ to rank the probabilty of the triples.

Basic Models

$\textbf{Model E and F}$: they were used for a combined KB+text graph ¹.

$\textbf{Model F}$:

K-dimention vector for both relation $r$ and entity pair $(e_s, e_o)$
Scoring function is $f(e_s, r, e_o) = v(r)^Tv(e_s, e_o)$
Do not share parameters for different entity pairs having same entity.

$\textbf{Model E}$:

Two K dimention vector $r_s, r_o$ for relations $r$
Each entity only has its own K-dimention vector (no vector for entity pair)
Scoring function is $f(e_s, r, e_o) = v(r_s)^Tv(e_s) + v(r_o)^Tv(e_o)$
For query $(e_s, r, ?)$, the ranking depends on $r$, not $e_s$

$\textbf{Model DISTMULT}$:

A special form of a bilinear model like RESCAL ³
K dimention vector for each entity $e_i$ and each relation $r$
Scoring function is $f(e_s, r, e_o) = v(r)^T(v(e_s) \circ v(e_o))$, where $\circ$ denotes the element-wise vector product
The entity pairs sharing an entiry will also share parameters, and ranking result for $(e_s ,r, ?)$ will depend on the $e_s$

CONV: Compositional Representations of Textual Relations

Because for the different dependency paths, the structure may be similar, the authors use CNN to encode the dependency arc and words as K-dimention vector and feed into CNN.

Assume each dependency arc or word is $e$
Firstly, map to embeddings $v = Ve$
The hidden layer is that $h = tanh(W^{-1}v{-1} + W^0v + W^1v{+1} + b)$
Use max-pooling to represent the final dependency paths $r = max{h}$

Loss function

For predicting entity, the authors use the function like negative loglikelihood

$p(e_o|e_s, r; \theta) = \frac{e_{f(e_s,r,e_o;\theta)}}{\sum_{e^{'} \in Neg(e_s, r, ?)}e_{f(e_s,r,e_{'};\theta)}}$

$p(e_s|e_o, r; \theta)$ is analogously
$\theta$ is the parameters
$Neg()$ is the negative sample for $(e_s, r, ?)$ (Because the whole samples are too large)

Because the textual relations task only takes part of the task (main is using KB relation), we use $\tau$ as weight for loss fucntion

$L(\Tau; \theta) = -\sum_{(e_s, r, e_o) \in \Tau}\text{log}p(e_s|e_o, r; \theta) -\sum_{(e_s, r, e_o) \in \Tau}\text{log}p(e_o|e_s, r; \theta)$

So Final loss fucntion is

$L(\Tau_{KB}; \theta) + L(\Tau_{text}; \theta) + \lambda||\theta||^2$

Reference

[1] Relation extraction with matrix factorization and universal schemas.

[2] Embedding Entities and Relations for Learning and Inference in Knowledge Bases

[3] A Three-Way Model for Collective Learning on Multi-Relational Data

永缘空的博客

Representing Text for Joint Embedding of Text and Knowledge Bases 笔记

Perface

Introduction Notes

Models for knowledge base completion Notes

Basic Models

CONV: Compositional Representations of Textual Relations

Loss function

Reference

Perface

Introduction Notes

Related Work Notes

Models for knowledge base completion Notes

Basic Models

CONV: Compositional Representations of Textual Relations

Loss function

Reference