This is just a blog to try and spread some of the knowledge that has been freely given to me by the wider community, without which I'd get absolutely nothing accomplished. I hope this benefits some of you out there.

Tuesday, August 25, 2009

Command Line English -> Foreign Language Dictionary



I have been studying Japanese for years. I love trying to learn the language, but it takes constant work. Lately what I've been trying to do is listen to NHK radio(their national news network) online. This has been great but since I'm not nearly fluent I have a lot of words I want to look up. And being at work I need to be able to look up something fast and continue on with what I'm doing i.e. resubmitting a search to an online dictionary is not going to cut it.

I found a great online dictionary at freedict, but as I mentioned above, it is a bit cumbersome. So I wrote a small python script to do the look up on that site for me. Put this on your path and you've got a commandline multi-language dictionary. This particular site does a lot of different translation, so I am going to modify my script so you can specify what exactly you want to translate and expand away from just Japanese to English. But for now, enjoy:


#!/usr/local/bin/python2.5

import urllib
import sys
import re

pattern = re.compile('')

data = urllib.urlencode({"search":sys.argv[1],"exact":"true","max":"10","to":"English","from":"Japanese","fname":"eng2jap2","back":"jap.html"})

file = urllib.urlopen("http://www.freedict.com/onldict/onldict.php",data)
results = pattern.sub('**',file.read())

pattern = re.compile('<(.|\n)*?>')
results = pattern.sub('',results)

pattern = re.compile('\s*\n+')
results = pattern.sub('\n',results)

pattern = re.compile('New search')
results = pattern.sub('',results)

pattern = re.compile('Online Dictionary Search Results')
results = pattern.sub('',results)

pattern = re.compile('\s\s\sJapanese')
results = pattern.sub('',results)

pattern = re.compile('\s\s\sEnglish')
results = pattern.sub('\n',results)

pattern = re.compile(' ')
results = pattern.sub('',results)

results = results.split('\n')
iterator = iter(results)

while True:
try:
line = iterator.next()
if '**' in line:
pattern = re.compile('\s*[A-Za-z]+')
match = pattern.match(line)
if match is not None:
line = line + iterator.next()
line = line.replace('**',' ')
print line
except StopIteration:
break




It is not perfect yet, but it'll do the job 95% of the time.

No comments:

Post a Comment

Followers