About | HeinOnline Law Journal Library | HeinOnline Law Journal Library | HeinOnline

17 Int'l J. Educ. Tech. Higher Educ. 1 (2020)

handle is hein.journals/intjedth17 and id is 1 raw text is: AI-Thwaib et al. International Journal of Educational Technology in Higher EducationInternational journal of Educational
(2020) 17:1
https://doi.org/10.1186/s41239-019-0174-x      Technology in  Higher Education
E    AR     AT                                                Open Aes
An academic Arabic corpus for plagiarism
detection: design, construction and
experimentation
Eman Al-Thwaibl, Bassam H. Hammo2  and Sane Yagi3

* Correspondence: b.hammo@ju.
edu.jo
2De   r   ,nformaton Systems
Department, Kng Abdula an
School of Information Technology,
University of Jordan, Amman,
Jordan
Fu  st of author information is
available at the end of the article

I Springer Open

Introduction
Plagiarism is simply defined as appropriating others' words, thoughts, or intellectual
property without providing proper citation or giving credit to them as the original
source. The Oxford Dictionary' defines plagiarism as The practice of taking someone
else's work or ideas and passing them off as one's own. With the exceptionally large
'https://en.oxforddictionaries.com/definition/plagiarism
© The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International
License (http//creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if chanaes were made.

Abstract
Advancement in information technology has resulted in massive textual material that
is open to appropriation. Due to researchers' misconduct, a plethora of plagiarism
detection (PD) systems have been developed. However, most PD systems on the
market do not support the Arabic language. In this paper, we discuss the design and
construction of an Arabic PD reference corpus that is dedicated to academic
language. It consists of (2312) dissertations that were defended by postgraduate
students at the University of Jordan (JU) between the years 2001-2016. This
Academic Jordan University Plagiarism Detection corpus; henceforth, JUPlag, follows
the Dewey decimal classification (DDC) in the way it is structured. The goal of the
corpus is twofold: Firstly, it is a database for the detection of plagiarism in student
assignments, reports, and dissertations. Secondly, the n-gram structure of the corpus
provides a knowledgebase for linguistic analysis, language teaching, and the learning
of plagiarism-free writing. The PD system is guided by JU Library's metadata for
retrieval and discovery of plagiarism. To test JUPlag, we injected an unseen dissertation
with multiple instances of plagiarism-simulated paragraphs and sentences.
Experimentation with the system using different verbatim n-gram segments is indeed
promising. Preliminary results encourage that permission be sought to enrich this
corpus with all the theses in the Thesis Repository of the Union of Arab Universities.
The JUPIag corpus is intended to function as an indispensable source for testing and
evaluating plagiarism detection techniques. Since the University of Jordan is seeking to
become a center for plagiarism detection for Arabic content and being a non-profit
organization, it will charge a nominal fee for the use of JUPIag to finance the
maintenance and development of the corpus.
Keywords: Corpus tools, Natural language processing, Plagiarism detection, Text
plagiarism, Verbatim plagiarism

What Is HeinOnline?

HeinOnline is a subscription-based resource containing thousands of academic and legal journals from inception; complete coverage of government documents such as U.S. Statutes at Large, U.S. Code, Federal Register, Code of Federal Regulations, U.S. Reports, and much more. Documents are image-based, fully searchable PDFs with the authority of print combined with the accessibility of a user-friendly and powerful database. For more information, request a quote or trial for your organization below.



Short-term subscription options include 24 hours, 48 hours, or 1 week to HeinOnline.

Contact us for annual subscription options:

Already a HeinOnline Subscriber?

profiles profiles most