Hello, my name is Felix Hamann.
+49 611 9495 1295
The construction and completion of knowledge graphs in industrial settings has gained traction over the past years. However, modelling a specific domain is often entailed with significant cost. This can be alleviated by including other knowledge sources such as text - a challenge known as open-world knowledge graph completion. Although knowledge graph completion has drawn significant attention from the research community over the past years, we argue that academic benchmarks fall short at two key characteristics of industrial conditions: (1) open-world entities are drawn randomly in benchmarks although in practice they are more volatile than closed-world entities, and (2) textual descriptions of entities are not concise.
This paper's mission is to bring academia and industry closer by proposing Inductive Reasoning with Text (IRT), an approach to create open-world evaluation benchmarks from given knowledge graphs. Two graphs, one based on Freebase and another derived from Wikidata, are created, analysed, and enhanced with textual descriptions according to the above assumptions. We evaluate a modular system that can tether any vector space knowledge graph completion model and a transformer-based text encoder to align sentence and entity representations. We show the difficulty of learning with such scattered text in contrast to other benchmarks provided texts and provide a solid baseline study for future model benchmarking.
Entity linking, the task of mapping textual mentions to known entities, has recently been tackled using contextualized neural networks. We address the question whether these results --- reported for large, high-quality datasets such as Wikipedia --- transfer to practical business use cases, where labels are scarce, text is low-quality, and terminology is highly domain-specific. Using an entity linking model based on BERT, a popular transformer network in natural language processing, we show that a neural approach outperforms and complements hand-coded heuristics, with improvements of about 20% top-1 accuracy. Also, the benefits of transfer learning on a large corpus are demonstrated, while fine-tuning proves difficult. Finally, we compare different BERT-based architectures and show that a simple sentence-wise encoding (Bi-Encoder) offers a fast yet efficient search in practice.
In retrieval applications, binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture, which is trained by minimizing reconstruction error. Instead of employing the original real-valued embeddings, we use latent representations in Hamming space produced by the encoder for similarity calculations. In quantitative experiments on several benchmarks for semantic similarity tasks, we show that our compressed hamming embeddings yield a comparable performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at compression ratios of up to 256:1. We further demonstrate that our model strongly decorrelates input features, and that the compressor generalizes well when pre-trained on Wikipedia sentences. We publish the source code on Github and all experimental results.
A lightweight C implementation for a distributed federated RabbitMQ setup implemented with ZeroMQ.