Travel Diary

To: [email protected] From: "naakannan" Date: Wed, 06 Feb 2002 13:45:18 -0000 Subject: [e-suvadi] Travel Diary 05.02.02 Reply-to: [email protected]
Dear Suvadians: Walking over the parks of Bangalore in the morning hours proves that Swami Vivekananda's dream of having a dynamic, brisk India is fulfilled. Several thousand people sweat their blood out in exercise and walking every morning. Loosing calories means a new found richness, B'lore is rich. The Silicon Valley of India ! My visit to the Super Computer Education and Research Center to meet Dr.N.Balakrishnan as per the recommendation of 'Barat Ratna' Abdul Kalam confirmed my conviction that India is in no mood to slog behind in IT developments. Dr.Balakrishnan has a mission to digitize a million books by next three years and place them on the web!! Surprise, Surprise!! He has at his command a team and 150 super fast scanners that is capable of digitizing at 6 pages per minute! Each one of them costs US$ 18,000 !! I felt belittled with my small grant of US$ 10,000 with my Laptop and a Mustek scanner :-) However, Balaki, as he is known, is very modest and friendly. We discussed the possibility of a collaboration and he generously agreed. He didn't have that much of time for an elaborate discussion and so I left with the hope of meeting him in the cyberspace. The original plan was to meet Dr.A.G.Ramakrishnan at IISc and discuss with him the OCR developments and meet Dr.Meera Chakravorthy at the Univ.of B'lore who considers my tamil poems for possible translation into Bengali. However, Mr.Ramachandra Budihal with his enthusiasm and IT expertise insisted that I meet his group 'Jaana Doota' before I venture on my digitization! So, my trip to B'lore was hurriedly organised. Mr.Ramachandra's team has some very impressive developments. They have the same ambitions and ideas as we have at THF and focusing on Karnataka. They have developed filters that separate the background from the text of a digitized manuscript! This enables photo off-set printing of ancient books and manuscripts as well as for a later OCR reading. Mr.Budihal a fellow Suvadian is full of energy and works with us in close collaboration. Back to OCR (at last!).....I took with me several scanned books of early 20th century for OCR testing. One of the ideas was to test the two OCRsdeveloped so far in tackling the problem of reading old books. The OCR of IISc is not font specific. Training the OCR for a particular font is apossibility developed but not implemented in the OCR module. On the other hand Inforeed's OCR has this feature integrated. Our initial testing with both modules did not give encouraging results. The old printing has the text smudged that interferes with the machine reading. When the old prints have broken characters that interferes as well. Inforeed Krishnamurti told me that through specific training of a particularfont we could improve the machine reading. I planned to leave a book with him for that purpose. Dr.Ramakrishnan has taken three of my scanned books for further testing. Both OCRs have shown remarkable reading when the prints are clear and neat. Ponvizhi of Inforeed has several additional features such as 'la' - 'La' checking, spell check and others. IISc has not come out yet with a commercial module (the model under test is called TamilGnani). However, their OCR is equally good. Both need further testing. OCR is a crucial link in making any textual database useful and searchable! We shall wait for its full development...soon. anpuTan Kannan