GraphGen, Retrospective Loss and Large Scale Pretraining – INDIAai

These are the most exciting AI research papers published in the last year. It combines advances in artificial intelligence (AI) with data science. It is ordered chronologically and includes a link to a longer article.

Deep neural networks (DNNs) have facilitated advancements in various fields. To better utilise the prior knowledge accessible in previous model states during training, the authors offer a new retrospective loss in this study. Together with the task-specific loss, minimising the retrospective loss pulls the present parameter state away from the optimal parameter state and towards the optimal parameter state from a previous training session. To verify that the suggested loss leads to increased performance across input domains, tasks, and architectures, the researchers analyse the approach and conduct comprehensive experiments in various domains, including images, voice, text, and graphs.

Model-based dialogue assessment measures, such as ADEM, RUBER, and the more recent BERT-based metrics, are gaining attention. These models aim to weigh responses highly if they are relevant and negatively if not. These models would be trained with various useful and irrelevant replies in an ideal world. As a result, current models are typically trained using a single relevant response and numerous randomly picked responses from unrelated contexts (random negatives) due to the need for more publicly available data.

The researchers present the DailyDialog++ dataset to facilitate improved training and rigorous evaluation of model-based metrics. Using this dataset, the researchers first demonstrated that n-gram-based and embedding-based metrics do not distinguish critical responses from random negatives when there are several correct references. Unlike n-gram and embedding-based metrics, model-based metrics do better on random negatives but significantly worse on adversarial examples. This study proposes a new BERT-based evaluation metric called DEB, which is pre-trained on 727M Reddit interactions and then fine-tuned on our dataset to determine whether or not large-scale pretraining is beneficial. Compared to other models, DEB performs substantially better on random negatives (88.27% accuracy) and correlates more with human evaluations. However, when tested on adversarial replies, its performance drops significantly again, demonstrating that only a massive pre-trained evaluation model can withstand the adversarial cases in their dataset.

There is a wealth of research on generative graph models in the data mining books. Newer methods have shifted away from relying on a pre-decided distribution and instead, learn this distribution directly from the data. In contrast, older methods rely on generating structures that comply with a pre-decided distribution. Learning-based approaches have increased quality, but some difficulties remain.

To address these shortcomings, the authors of this study create a generic method they name "GraphGen." Using minimal DFS codes, GraphGen transforms graphs into sequences. Canonical labels, such as minimum DFS codes, record the graph structure in addition to the label information. A unique LSTM architecture is used to learn the intricate joint distributions of structural and semantic labels. Extensive studies on million-size, real-world graph datasets reveal that GraphGen is four times faster on average than state-of-the-art approaches and superior in quality across a comprehensive range of eleven different measures.

See more here:

GraphGen, Retrospective Loss and Large Scale Pretraining - INDIAai

Related Posts

Comments are closed.