Video: My "Open Data for Developers" session from #CapitolCamp

Last Friday, the NY Senate CIO hosted the first ever Capitol Camp in Albany.

Along with Remy DeCausemaker, I led a session on “Open Data for Developers”, discussing the Why?/What?/How?’s of releasing government data in open formats with open tools. Remy demonstrated the scrubbers and scrapers that his organization Civx.us has developed.

Here’s the video…

You can view the complete session notes here along with notes and information on the rest of the event. There are also uncut videos of the rest of the sessions on the NY Senate Uncut YouTube Channel.

links for 2009-06-04

  • When you create a PersistenceManagerFactory / EntityManagerFactory you define the connection URL, driver name, and the username/password to use. This works perfectly well but does not "pool" the connections so that they are efficiently opened/closed when needed to utilise datastore resources in an optimum way. DataNucleus allows you to utilise a connection pool using Apache DBCP to efficiently manage the connections to the datastore.
Published
Categorized as Awareness

links for 2009-06-02

  • The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images
  • OCR Terminal is a web-based Optical Character Recognition (OCR) service, allowing you to convert your images into searchable text.
    (tags: ocr)
  • OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
Published
Categorized as Awareness

links for 2009-05-25

Published
Categorized as Awareness