Feb 20

XML vs CSV

by in Techie :: Techno ::

I’m currently working on a new project in the evenings and weekends that involves playing with merchant datafeeds.

The big lesson I’ve learnt in the last 48 hours is that XML is a pig to work with.

The data from merchant X weighs in at 84 megs as a CSV. If you grab the same data as XML you end up with a massive 261 megs. Sure, hard drives are cheap, but the server load goes through the roof when it has to process the larger XML file…

Moral of the story – stick to CSV

Related Posts:

  • No Related Posts

9 Responses to “XML vs CSV”

  1. From hostyle:

    Not sure of original author, but to quote someone on the internet: “XML is like violence: if a little doesn’t solve the problem, use more.”

    Posted on February 20, 2008 at 9:25 am #
  2. From Dominykas:

    Um. I have to completely disagree with you. The problem is that CSV is not a worldwide standard as of yet – Germans use “,” (comma) as a decimal separator, thus their Excel exports (and imports) stuff using “;” (semicolon) as a “value separator” – rather than the English/American “.” (dot) for decimal and “,” comma for values. Consider the fact, that other countries have even more decimal separator symbols – I haven’t done my research fully there, but I’d suppose there might a problem or two elsewhere. Sure – CSV is quick’n'easy’n'dirty, but XML gives you the real thing. That of course does not apply to “local market” products.

    Posted on February 20, 2008 at 7:26 pm #
  3. From Michele Neylon:

    @Dominykas – the software I’m using can handle multiple formats, but the HUGE XML files are not making it happy

    Posted on February 20, 2008 at 8:13 pm #
  4. From hostyle:

    Dominykas: how is that a problem for csv ?
    US/UK: “12,345.678″,”whatever”,”blah”
    European: “12.345,678″,”whatever”,”blah”

    Posted on February 21, 2008 at 8:55 am #
  5. From Hugh:

    Michele,
    Some prefer XML for constantly updated items like news feeds etc, as it’s easier to figure out and parse small chunks of it.
    I’m working on a price comparison site at the moment, and we’ll be pulling in 100′s of datafeeds from loads of merchants. For this it makes sense for us to use CSV – we download each feed once a day using cron, and run a shell script once a day to unzip and import each feed into the database. Currently it takes about 2 hours for a full update, but based on what i’ve seen testing xml feeds, it’d take at least double that using xml.
    CSV – simple and effective.

    Posted on February 21, 2008 at 7:19 pm #
  6. From Michele Neylon:

    Hugh
    You’re in the same boat as me so :)
    Michele

    Posted on February 21, 2008 at 8:27 pm #
  7. From Ken Stanley:

    CSV is great for keeping file sizes down if there’s a linear pattern to the content, like a SQL dump. As Dominykas said, it’s not a standard and if the data needs to be portable, this can cause problems. XML is great for storing scalable pattern, non-linear data and has its uses too – but the nature of it, where each piece of data is marked-up/tagged means that it’s seriously bloated. XML and CSV are very different in my opinion. I’ll rarely use XML where CSV will suffice.

    Posted on February 24, 2008 at 6:09 pm #
  8. From Tom Gleeson:

    I once wrote a post about the great data lingua franca debate (http://blog.gobansaor.com/2007/03/03/tables-vs-xml-the-data-lingua-franca-debate/)
    but of course there was no debate, at least then, good to see others appreciating the “power” of the humble CSV table ;-)
    Tom

    Posted on February 24, 2008 at 7:26 pm #
  9. From Michele Neylon:

    @Ken – the data I’m working with is provided by various merchants. Using the XML version simply adds bloated files with an expensive processing overhead. The CSV files are relatively light by comparison
    @Tom – The right tool for the job :)

    Posted on February 24, 2008 at 9:47 pm #

Leave a Reply

Notify me of followup comments via e-mail. You can also subscribe without commenting.