First off, thanks for wanting to contribute!
As long as the data is public domain (CC0 license), we can accept it in any format. CSV or JSON would probably be preferred though, since they're relatively simple to read in Python, which is likely what we'd use to process it at this point.
If it's not explicitly public domain, we'd have to work out with Rob whether we'd be able to use it. Although factual data can't be copyrighted, there are odd laws to do with copyright of whole databases that we need to avoid being stung by.