Library and Information Science

Library and Information Science ISSN: 2435-8495
三田図書館・情報学会 Mita Society for Library and Information Science
〒108‒8345 東京都港区三田2‒15‒45 慶應義塾大学文学部図書館・情報学専攻内 c/o Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan
Library and Information Science 54: 1-18 (2005)

原著論文Original Article

圧縮プログラムを応用した著者推定Authorship attribution by data compression program

亜細亜大学Asia University ◇ 〒180-8629 東京都武蔵野市境五丁目4番10号 ◇ Sakai 5-24-10, Musashino, Tokyo 180-8629, Japan

受付日:2005年6月6日Received: June 6, 2005
受理日:2005年10月30日Accepted: October 30, 2005
発行日:2006年3月10日Published: March 10, 2006

Benedetto et al. recently confirmed the validity of a method for measuring similarity using data compression software. Despite its potential, this method has not yet been applied to the field of information science. The present study proposes the use of CIR, a modified method that uses an improved ratio of compression, and describes two experiments on authorship attribution using data from modern Japanese literature. The first experiment compares the results of applying CIR and Benedetto’s method to test collections of modified data (fixed length) using a procedure similar to that described by Matsuura et al. The second experiment is based on original data (variable length).

The first experiment showed an average precision rate of 97.7% for CIR, while Benedetto’s method gave a rate of 90.5%. The CIR method proves to be an improvement on the best method described by Matsuura et al. The second experiment confirmed the effectiveness of the CIR method, giving an average precision rate of 95.7%.

This page was created on 2021-01-18T13:05:24.136+09:00
This page was last modified on