Perface

This is the notes when I was reading Reading The Web with Learned Syntactic-Semantic Inference Rules paper.

Terminology and Notation Notes

$C$ is a set of entities, $R$ is a set of relations, $T$ is a set of triples, $T \subseteq C \times R \times C$.
$(c, r, c^{‘})$ is the triple. Each triple represents an instance $r(c, c^{‘})$ of the relation $r \in R$
$r^{-1}$, the inverse relation $r(c, c^{‘}) \Longleftrightarrow r^{-1}(c^{‘}, c)$, e.g. $Parent^{-1} = Children$
$\pi = $ is the path type. For example,:
“the persons who were born in the same town as the
query person” $\rightarrow$ $\pi_{1} : $
“the nationalities of persons who were born in the same town as the query person” $\rightarrow$ $\pi_{2} : $

Learning Syntactic-Semantic Rules with Path-Constrained Random Walks Notes

Firstly, PRA lists a large set of bounded-length path types as ranking “experts”.
Each path generates some instance starting from node s and ranking end nodes t by their weights in the resulting distribution.
Finally, PRA uses logistic regression to predict the probability through weights distribution of “experts”.

As shown in Figure 1, they also combine syntactic and semantic patterns.

Use Entity Resolution to link entities in KB and text.
Edges in text has been syntactically analyzed with a dependency
parser.

For instance, given query $\text{Profession(CharlotteBronte, ?)}$,

PRA lists several answers may be relevant to this.
The answers are produced by random walk probabilities from node $\text{CharlotteBronte}$ to a certain profession node.
PRA learn path types combining KB and text information.

Path Ranking Algorithm Notes

$\pi = $ is the path of relation. $P(s \rightarrow t; \pi)$ is the probability of walking from s node to t node.
$B = <\perp, \pi_1, …, \pi_n>$ is the set of all paths no longer than $l$. For $P(s \rightarrow t; \perp)$, its value is 1.

$score(s, t) = \sum_{\pi \in B}P(s \rightarrow t; \pi)\theta_{\pi}$

$\theta_{\pi}$ is the weight of $\pi$
In path discorvery section, it will introduce how to confirm $B$

Path Discovery

Because considering all paths is too expensive for computing, they add two constrains:

The path type is active for more than K training
query nodes
Using a threshold of averaging score for query node filters the paths.

Training Examples

Node pairs in KB are positive examples (author subsample the positive examples)
For negative training examples, considering all possible pairs whose type are compatible with query node wastes the computing resourse. Like parent relation, any pair of persons related to this relation could be negative examples. Therefore, they use a simple biased sampling method. Retrieve and rank the answers of query node, select k(k + 1)/2-th positions are selected as negative samples, where k = 0, 1, 2, ….
Use Logistic Regression to train

Extending PRA

They raised three methods to extend PRA. Two of them are about sampling, we do not care too much about these sections. Another is text graph construction, which is important to learn KB + text.

Text Graph Construction Notes

They collect a large Web corpus which are POS-tagged and dependency parsed. For each sentence, they produce a dependency tree with each edge labeled with a standard dependency tag ².
They use POS tags and dependency edges to identify potential referring noun phrases (NPs).
Clusters are grouped by using a within-document coreference resolver and they match the clusters with FreeBase concepts.

Reference

[1] Reading The Web with Learned Syntactic-Semantic Inference Rules

[2] Accurate Unlexicalized Parsing

[3] A Three-Way Model for Collective Learning on Multi-Relational Data