Travel Diary
To: [email protected]
From: "naakannan"
Date: Wed, 06 Feb 2002 13:45:18 -0000
Subject: [e-suvadi] Travel Diary 05.02.02
Reply-to: [email protected]
Dear Suvadians:
Walking over the parks of Bangalore in the morning hours proves that
Swami Vivekananda's dream of having a dynamic, brisk India is
fulfilled. Several thousand people sweat their blood out in exercise and walking
every morning. Loosing calories means a new found richness,
B'lore is rich. The Silicon Valley of India !
My visit to the Super Computer Education and Research Center to meet
Dr.N.Balakrishnan as per the recommendation of 'Barat Ratna' Abdul Kalam
confirmed my conviction that India is in no mood to slog behind in IT
developments. Dr.Balakrishnan has a mission to digitize a million
books by next three years and place them on the web!! Surprise,
Surprise!!
He has at his command a team and 150 super fast scanners that is
capable of digitizing at 6 pages per minute! Each one of them costs
US$ 18,000 !! I felt belittled with my small grant of US$ 10,000 with
my Laptop and a Mustek scanner :-)
However, Balaki, as he is known, is very modest and friendly. We
discussed the possibility of a collaboration and he generously agreed. He didn't
have that much of time for an elaborate discussion and so I left with the hope of
meeting him in the cyberspace.
The original plan was to meet Dr.A.G.Ramakrishnan at IISc and discuss
with him the OCR developments and meet Dr.Meera Chakravorthy at the Univ.of
B'lore who considers my tamil poems for possible translation into Bengali.
However, Mr.Ramachandra Budihal with his enthusiasm and IT expertise
insisted that I meet his group 'Jaana Doota' before I venture on my
digitization!
So, my trip to B'lore was hurriedly organised. Mr.Ramachandra's team
has some very impressive developments. They have the same ambitions and
ideas as we have at THF and focusing on Karnataka. They have developed
filters that separate the background from the text of a digitized manuscript! This
enables photo off-set printing of ancient books and manuscripts as well as for
a later OCR reading. Mr.Budihal a fellow Suvadian is full of energy and works
with us in close collaboration.
Back to OCR (at last!).....I took with me several scanned books of
early 20th century for OCR testing. One of the ideas was to test the two
OCRsdeveloped so far in tackling the problem of reading old books. The OCR
of IISc is not font specific. Training the OCR for a particular font
is apossibility developed but not implemented in the OCR module. On the
other hand Inforeed's OCR has this feature integrated. Our initial
testing with both modules did not give encouraging results. The old
printing has the text smudged that interferes with the machine
reading.
When the old prints have broken characters that interferes as well.
Inforeed Krishnamurti told me that through specific training of a
particularfont we could improve the machine reading. I planned to leave a
book with him for that purpose. Dr.Ramakrishnan has taken three
of my scanned books for further testing.
Both OCRs have shown remarkable reading when the prints are clear
and neat. Ponvizhi of Inforeed has several additional features such as
'la' - 'La' checking, spell check and others. IISc has not come out
yet with a commercial module (the model under test is called TamilGnani).
However, their OCR is equally good.
Both need further testing. OCR is a crucial link in making any textual
database useful and searchable! We shall wait for its full development...soon.
anpuTan
Kannan