India is home to a vast treasure of knowledge, inscribed, stored and passed on from one generation to the next through a unique medium – palm leaves. It has revered the world over for its knowledge and progress in various fields including Mathematics, Science, Spirituality and Medicine since ancient times. In the present scenario, there is an urgent need to preserve our rich heritage and provide an easy interface, using modern technology and techniques, thereby enhancing accessibility, applicability and appreciation for the repository of knowledge. There is a need to preserve the palm-leaf manuscripts (PLMs) in their original form, as well as to take up their processing especially through digitization. A well-built catalogue is a primary requirement to facilitate effective and efficient information retrieval. The main aim of this research is to provide users with a standard means for intellectual access to the digitized materials. Many conservation centers across the country are some such areas where huge collections of manuscripts are stored and preserved. The key objective of this research is to propose an enhanced schema for automatic metadata extraction to ease the metadata creation process which facilitates efficient search and retrieval mechanism for manuscripts. This research mainly focusses on reducing the data deficiency occurred in the catalogues due to manual metadata entry procedures. Hence, there is a strong need for an automatic metadata extraction schema that can categorize the digital PLM images according to their metadata schema. The key benefits in this research are the scalability and usability of this schema in various digital collections. The metadata for these PLM images are customized to enhance more accuracy while retrieving, which produce effective search results for many digital collections of PLMs. Hence the outcome of this research can be useful in two ways firstly to prioritize the least/high damaged manuscript to perform restoration and secondly to obtain accurate search results from two methods proposed using TF-IDF and crowdsourcing approach. These can be widely utilized in various digital libraries across the globe. This metadata schema can be incorporated into an enhanced search engine for obtaining better precision and recall results.
Nagendra Panini Challa will be speaking at International Congress on Grid, Distributed & Parallel Computing 2022 which is scheduled to happen on 11th and 12th August 2022 at Hong Kong, HKSAR.