Summarizing Text for Indonesian Language by Using Latent Dirichlet Allocation and Genetic Algorithm

Silvia ., Pitri Rukmana, Vivi Regina Aprilia, Derwin Suhartono, Rini Wongso, Meiliana .

Abstract


The number of documents progressively increases especially for the electronic one. This degrades effectivity and efficiency in managing them. Therefore, it is a must to manage the documents. Automatic text summarization is able to solve by producing text document summaries. The goal of the research is to produce a tool to summarize documents in Bahasa: Indonesian Language. It is aimed to satisfy the user’s need of relevant and consistent summaries. The algorithm is based on sentence features scoring by using Latent Dirichlet Allocation and Genetic Algorithm for determining sentence feature weights. It is evaluated by calculating summarization speed, precision, recall, F-measure, and some subjective evaluations. Extractive summaries from the original text documents can represent important information from a single document in Bahasa with faster summarization speed compared to manual process. Best F-measure value is 0,556926 (with precision of 0.53448 and recall of 0.58134) and summary ratio of 30%.

Keywords


Automatic Text Summarization; Sentence Features; Genetic Algorithm; Extractive Summaries; Latent Dirichlet Allocation

References


Aristoteles, Hardiyeni, Y., Ridha, A., and Adisantoso. “Text Feature Weighting for Summarization of Documents in Bahasa Indonesia Using Genetic Algorithm”. International Journal of Computer Science Issues 9(1):1-6, 2012.

Bawden, D. and Robinson, L. “The Dark Side of Information: Overload, Anxiety and Other Pathologies”. Journal of Information Science 35(2):180-191, 2009.

Berker, M. and Gungor, T. “Using Genetic Algorithms with Lexical Chains for Automatic Text Summarization”. 4th International Conference on Agents and Artificial Intelligence, 1:595-600, 2012.

Budhi, G.S., Intan, R., Silvia, R., and Stevanus, R.R. “Indonesian Automated Text Summarization”. Proceeding 1st International Conference on Soft Computing, Intelligent System and Information Technology, 2007.

Garcia, R.A., Hernandez, and Ledeneva, Y. “Word Sequence Models for Single Text Summarization”. 2nd International Conferences on Advances in Computer-Human Interactions page 44-48, 2009.

Gholamrezazadeh, S., Salehi, M.A., and Gholamzadeh, B. “A Comprehensive Survey on Text Summarization System”. Proceedings of CSA 9:1-6, 2009.

Gong, Y. and Liu, X. 2001. “Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis”. Proceedings of The 24th International ACM SIGIR Conference on Research and Development in Information Retrieval 19-25, 2001.

Gupta, V. and Lehal, G.S. "A Survey of Text Summarization Extractive Techniques”. Journal of Emerging Technologies in Web Intelligence 2(3):258-268, 2010.

Jezek, K. and Steinberger, J. “Automatic Text Summarization (The State of The Art 2007 and New Challenges)”. Znalosti p.1-12, 2008.

Jurafsky, D. and Martin, J.H. “Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, And Speech Recognition”, 2nd Edition. New Jersey: Pearson Prentice Hall, 2006.

Kumar, Y.J. and Salim, N. “Automatic Multi Document Summarization Approaches”. Journal of Computer Science 8(1):133-140, 2012.

Manning, C.D., Raghavan, P., and Schutze, H. “Introduction to Information Retrieval”. Cambridge: Cambridge University Press, 2009.

Mihalcea, R. And Tarau P. “A Language Independent Algorithm for Single and Multiple Document Summarization”. Proceeding of IJCNLP, 2005.

Nenkova, A. “Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference”. Columbia University, 2005.

Netcraft. “Web Server Survey”. Retrieved August 18, 2013, from http://news.netcraft.com/archives/2013/08/09/august-2013-web-server-survey.html, 2013.

Radev, D. R., Hovy, E., and McKeown, K. “Introduction to the Special Issue on Summarization”. Computational Linguistics., 28(4):399-408, 2002.

Suanmali, L., Salim, N., and Binwahlan, M.S. “Genetic Algorithm Based Sentence Extraction for Text Summarization”. International Journal of Innovative Computing 1(1):1-22, 2011.

Suhartono, D., Christiandy, D., and Rolando. “Lemmatization Technique in Bahasa: Indonesian Language”. Journal of Software 9 (5), 1202-1209, 2014.

Suneetha, S. “Automatic Text Summarization: The Current State of The Art”. International Journal of Science and Advanced Technology 1(9):283-293, 2011.


Full Text: PDF

Refbacks

  • There are currently no refbacks.