Paraphrase Detection Using Manhattan's Recurrent Neural Networks and Long Short-Term Memory

Achmad Aziz, Esmeralda Contessa Djamal, Ridwan Ilyas

Abstract


Natural Language Processing (NLP) is a part of artificial intelligence that can extract sentence structures from natural language. Some discussions about NLP are widely used, such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) to summarize papers with many sentences in them. Siamese Similarity is a term that applies repetitive twin network architecture to machine learning for sentence similarity. This architecture is also called Manhattan LSTM, which can be applied to the case of detecting paraphrase sentences. The paraphrase sentence must be recognized by machine learning first. Word2vec is used to convert sentences to vectors so they can be recognized in machine learning. This research has developed paraphrase sentence detection using Siamese Similarity with word2vec embedding. The experimental results showed that the amount of training data is dominant to the new data compared to the number of times and the variation in training data. Obtained data accuracy, 800,000 pairs provide accuracy reaching 99% of training data and 82.4% of new data. These results are better than the accuracy of the new data, with half of the training data only yielding 64%. While the amount of training data did not effect on training data.


Keywords


language processing; natural paraphrase; RNN; siamese similarity;

Full Text: PDF

Refbacks

  • There are currently no refbacks.