Google

Google Hates Computer Code

Recently, I picked up a nice technical book on Fast Fourier Transforms.  The good thing about the book was that it wasn't deep into mathematical proofs and it contained small self-contained pieces of code.  The bad thing about the book is that all the code is in BASIC and some of the code is quite long to type by hand.

It dawned on my that if you load images into Google Docs it will automatically OCR the file for you.  I also have 10 year old Xerox scanner with some old software called PaperPort that handles scanning, OCR and document management.  I'd figure that modern Google cloud software would be way better than desktop software.  I was really wrong.

I took some photographs of pages that contained BASIC code in my book with iPhone 4.   Its not a great image but a company such as Google that can build self-driving cars should be able to adjust images automatically, right?

 
Some DFT code written in BASIC.

Some DFT code written in BASIC.

 

 

On my desktop computer, I uploaded the images to Google Drive.  Then for each file, I imported them into Google Docs.  The import process automatically creates a new Google Doc with both the original image and the OCR'd text.  For a comparison, I used my ancient Windows XP-era PaperPort to OCR my images.  Here is the output for each side-by-side.

Its absolutely mind blowing how poorly Google Docs OCR performed. Its really, really bad: I  see bits of Chinese characters and the formatting is all over the place.  The output is completely unusable.  On the other hand, the PaperPort OCR output was quite good.  The main issues were the insertion of spaces that didn't exist and confusing the letter O with the number 0. 

Frankly, the Google translation is so bad  I might have as well as dropped my book onto my keyboard and use whatever it typed as the code translation.  Unacceptable Google.