Handle comma (,) when converting XML string from string to csv using python

I'm having a requirement to convert xml from string to csv and print it in the console. My record separator is comma (,) for the csv. If comma is coming inside the csv then I had to handle it by putting double quotation mark for every xml field while retrieving. Code is as following.

from xml.etree.ElementTree import XML

input_xml = """<val><policies>
    <policy>
        <policyId>995932</policyId>
        <statecode>FL,FT</statecode>
        <eq_site_limit>CLAY COUNTY</eq_site_limit>
        <hu_site_limit>0</hu_site_limit>
        <fl_site_limit>19260000</fl_site_limit>
        <fr_site_limit>0</fr_site_limit>
        <tiv_2011>0</tiv_2011>
        <tiv_2012>19260000</tiv_2012>
        <eq_site_deductible>20610000</eq_site_deductible>
        <hu_site_deductible>0</hu_site_deductible>
        <fl_site_deductible>0</fl_site_deductible>
        <fr_site_deductible>0</fr_site_deductible>
        <point_latitude>0</point_latitude>
        <point_longitude>30.102226</point_longitude>
        <line>-81.713882</line>
        <construction>Commercial</construction>
        <point_granularity>Reinforced Concrete</point_granularity>
    </policy>
    <policy>
        <policyId>223488</policyId>
        <statecode>FL</statecode>
        <eq_site_limit>CLAY COUNTY</eq_site_limit>
        <hu_site_limit>328500</hu_site_limit>
        <fl_site_limit>328500</fl_site_limit>
        <fr_site_limit>328500</fr_site_limit>
        <tiv_2011>328500</tiv_2011>
        <tiv_2012>328500</tiv_2012>
        <eq_site_deductible>348374.25</eq_site_deductible>
        <hu_site_deductible>0</hu_site_deductible>
        <fl_site_deductible>16425</fl_site_deductible>
        <fr_site_deductible>0</fr_site_deductible>
        <point_latitude>0</point_latitude>
        <point_longitude>30.102217</point_longitude>
        <line>-81.707146</line>
        <construction>Residential</construction>
        <point_granularity>Wood</point_granularity>
    </policy>
</policies></val>"""

root = XML(input_xml)
parsed = root.find('policies')
# print(parsed)

data = []

for policy in parsed:
    policyId = policy.find('policyId').text
    statecode = policy.find('statecode').text
    eq_site_limit = policy.find('eq_site_limit').text
    hu_site_limit = policy.find('hu_site_limit').text
    fl_site_limit = policy.find('fl_site_limit').text
    fr_site_limit = policy.find('fr_site_limit').text
    tiv_2011 = policy.find('tiv_2011').text
    tiv_2012 = policy.find('tiv_2012').text
    eq_site_deductible = policy.find('eq_site_deductible').text
    hu_site_deductible = policy.find('hu_site_deductible').text
    fl_site_deductible = policy.find('fl_site_deductible').text
    fr_site_deductible = policy.find('fr_site_deductible').text
    point_latitude = policy.find('point_latitude').text
    point_longitude = policy.find('point_longitude').text
    line = policy.find('line').text
    construction = policy.find('construction').text
    point_granularity = policy.find('point_granularity').text

    data.append(
        '"{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}", "{}"'.format(policyId, statecode, eq_site_limit,hu_site_limit, fl_site_limit,fr_site_limit, tiv_2011, tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible, point_latitude,point_longitude, line, construction,point_granularity))

print('\n'.join([row for row in data[0:]]))

Output is following:

"995932", "FL,FT", "CLAY COUNTY", "0", "19260000", "0", "0", "19260000", "20610000", "0", "0", "0", "0", "30.102226", "-81.713882", "Commercial", "Reinforced Concrete"
"223488", "FL", "CLAY COUNTY", "328500", "328500", "328500", "328500", "328500", "348374.25", "0", "16425", "0", "0", "30.102217", "-81.707146", "Residential", "Wood"

Except for "FL,FT" value I want to remove double quotation from others if they don't have comma value. The xml may or may not contain comma value as a field value. So if the comma value exists I want to put double quote("") only for that record.

Expected output:

 995932, "FL,FT", CLAY COUNTY, 0, 19260000, 0, 0, 19260000, 20610000, 0, 0, 0, 0, 30.102226, -81.713882, Commercial, Reinforced Concrete

Thank you

Answers 1

  • A good CSV library should handle that for you. You don't need to build the CSV yourself. Just make sure you have the data correct in Python. Here's an example:

    import csv
        
    data = ["a", "c,d", "e,f,g"]   # Just make sure you have data like this
    with open('file.csv', mode='w') as file:
        writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        writer.writerow(data)
    

    If you can't write to a file but want a string instead, use StringIO:

    import csv
    import io
    output = io.StringIO()
    
    data = ["a", "c,d", "e,f,g"]   # Just make sure you have data like this
    writer = csv.writer(output, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    writer.writerow(data)
    string = output.getvalue()
    

Related Articles