CLOSE AND OPEN TASK AUTHORSHIP ATTRIBUTION: A COMPUTATIONAL AUTHORSHIP ANALYSIS

Nur Inda Jazila

Abstract


Authorship analysis is one of the areas lies within forensic linguistics where the main task is to investigate the characteristics of a text in terms of its authorship. Specifically, authorship attribution examines the possibility of an author for having written the text by analyzing the author's other works. This experimental research addresses two problems: which author writes which text (using a closed task authorship attribution) and who writes each text (using an open task of authorship attribution). In doing so, this research uses R to do statistical computing employing both stylo() and classify() functions. Based on carried out experiments with 1-grams as a fixed variable, it is concluded that SVM algorithm may be best used in doing closed task authorship attribution for its 100% consistency, whereas for the open task k-NN algorithm may be best used since it reaches 94% consistency. In addition to open class task, stylo() function may perform better than classify() function since stylo() function provides results closer to the actual answer. As the legal system often challenges authorship analysis for not having a valid methodology, analyzing styles using stylometry and measuring the styles computationally may help forensic linguists to provide an adequate analysis for the legal system. Scientifically this research provides a framework of how to do authorship analysis computationally while practically it is projected can be used as a tool to detect plagiarism.

 


Keywords


authorship analysis; classify function; computational approach; forensic linguistics; stylo function

Full Text:

PDF

References


De Vel, O., et al. (2001). Mining E-mail Content for Author Identification Forensics. ACM Sigmod Record, 30(4), 55-64.

Eder, M., Rybicki, J., and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. "R Journal", 8(1): 107-121.

Grant, T. (2007). Quantifying Evidence in Forensic Authorship Analysis. International Journal of Speech, Language & the Law, 14(1).

Grant, T. (2008). Approaching Questions in Forensic Authorship Analysis. Dimensions of Forensic Linguistics, 5, 215.

Grant, T. (2010). Text Messaging Forensics. Txt 4n6: Idiolect Free Authorship Analysis?. In M. Coulthard and A. Johnson (Ed.), The Routledge Handbook of Forensic Linguistics (pp. 508-522). Abingdon: Routledge.

Grant, T. (2013). TXR 4N6: Method, Consistency, and Distinctiveness in the Analysis of SMS Text Messages. Journal of Law and Policy, 21(2), 467-494.

Gray, A., MacDonell, S., & Sallis, P. (1997). Software Forensics: Extending Authorship Analysis Techniques to Computer Programs. Retrieved December 20, 2018 from https://ourarchive.otago.ac.nz/handle/10523/872.

Hughes, V. (2013, July 19). How Forensic Linguistics Outed J.K. Rowling (Not to Mention James Madison, Barack Obama, and the Rest of Us). National Geographic. Retrieved 20 December 2018, from https://www.nationalgeographic.com/science/phenomena/2013/07/19/how-forensic-linguistics-outed-j-k-rowling-not-to-mention-james-madison-barack-obama-and-the-rest-of-us/.

Iqbal, F., et al. (2013). A Unified Data Mining Solution for Authorship Analysis in Anonymous Textual Communications. Information Sciences, 231, 98-112.

Jeffreys, B. (2018, December 14). Cheating university students face FBI-style crackdown. BBC News. Retrieved December 17, 2018, from https://www.bbc.com/news/education-46530639.

Juola, P. (2008). Authorship Attribution. Foundations and Trends in Information Retrieval. Vol 1. n.3. Boston: NOW Publishers.

Luyckx, K. (2010). Scalability Issues in Authorship Attribution. A Doctoral Dissertation. Antwerp University.

MacLeod, N., & Grant, T. (2012). Whose Tweet? Authorship analysis of micro-blogs and other short-form messages. In S. Tomblin, N. MacLeod, R. Sousa-Silva, & M. Coulthard (Eds.), Proceedings of the International Association of Forensic Linguists’ tenth biennial conference (pp. 210-224). Aston University.

McMenamin, G.R. (2002). Forensic Linguistics: Advances in Forensic Stylistics. Boca Raton: CRC Press LLC.

Oliveira, B., van der Voet, J., and Jazilah, N. (2018). Protocol for Authorship Analysis.

Olsson, J. (2009). Word Crime: Solving Crime through Forensic Linguistics. London, Bloomsbury.

Peng, J., Choo, K. K. R., & Ashman, H. (2016). Bit-level N-gram Based Forensic Authorship Analysis on Social Media: Identifying Individuals from Linguistic Profiles. Journal of Network and Computer Applications, 70, 171-182.

Solan, L.M. (2010). The Forensic Linguist: The Expert Linguist Meets the Adversarial System. In M. Coulthard and A. Johnson (Ed.), The Routledge Handbook of Forensic Linguistics (pp. 395-407). Abingdon: Routledge.

Solan, L.M. (2013). Intuition versus Algorithm: The Case of Forensic Authorship Attribution. Journal of Law and Policy, 21(2), 551-576.

Verhoeven, B. (2015, May). Computational Stylometry. Guest Lecture at Universite Libre de Bruxelles. Retrieved May 20, 2018 from https://pdfs.semanticscholar.org/.

presentation/e9ab/e5010a5ba3c71dac08ab9f43ec38fce66906.pdf

Zheng, R., et al. (2003, June). Authorship Analysis in Cybercrime Investigation. In International Conference on Intelligence and Security Informatics (Pp. 59-73). Springer, Berlin, Heidelberg.




DOI: https://doi.org/10.18860/prdg.v2i1.6704

Refbacks

  • There are currently no refbacks.




Member of:

Crossref

Indexed by:

SintaGoogle Scolar Sinta  MorarefSintaSintaSinta


Editorial Office

Department of English Literature

Faculty of Humanities, Universitas Islam Negeri Maulana Malik Ibrahim Malang
Jalan Gajayana 50 Malang, Jawa Timur, Indonesia 65144
Phone (+62) 341 551354, Facsimile (+62) 341 572533
e-mail: paradigm@uin-malang.ac.id


Creative Commons License
PARADIGM: Journal of Language and Literary Studies by Department of English Literature is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://ejournal.uin-malang.ac.id/index.php/paradigm.

View My Stats | Follow Us on Instagram