Forum Thread: Does Anyone Have Any Experience Parsing XML with Python??

I dont want to waste anybodys time here but i will upload a sample of my xml file and the code i am trying to use to parse it, if anybody can tell me what im doing wrong i would be insanely grateful

xml sample:
<LineItem count="1">
<PurchaseOrderUnit>
<PurchaseOrderUnitId identType="VIN" ident="021360"/>
<PurchaseOrderUnitQty UOMBasis="pack">10</PurchaseOrderUnitQty>
<BuyersCost UOMBasis="pack">12.1200</BuyersCost>
<Taxes taxable="No"/>
</PurchaseOrderUnit>
<RetailUnitPricing>
<RetailUnitId identType="GTIN" ident="00005200003940">SINGLE</RetailUnitId>
<RetailUnitQty identType="GTIN" ident="00005200003940" UOMBasis="each">1.000</RetailUnitQty>
<RetailPrice>1.69</RetailPrice>
</RetailUnitPricing>
</LineItem>

I am trying to extract the PurchaseOrderUnitId ident attrib number, and the PurchaseOrderUnitQty, they eventually need formatted like this 6 digit id number/+Qty. But right now im just trying to pull the numbers out of the xml file

With this code:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file="c:\\users\\design\\desktop\\scripttest\\newsample.xml")
root = tree.getroot()
for PurchaseOrderUnit in root.findall('PurchaseOrderUnit'):
qty = PurchaseOrderUnit.findall('PurchaseOrderUnitQty').text
id = PurchaseOrderUnit.get('PurchaseOrderUnitId')
print id, qty

when i run it from the command prompt it doesnt output anything, no errors or nothing.

10 Responses

ok firstly the best practice as beginner with xml is to put the xml file in the same folder as the .py file, to avoid errors when moving your project to other pcs, and using minidom would be the best to parse your xml

so use this
file = minidom.parse("sample.xml")

and use a for statment for each attributes you want to get, and nodelists should be printed using () mean print(id) print(qty) if im not wrong,

so your final .py file could be something like that:

from xml.dom import minidom
import itertools
file = minidom.parse("sample.xml")
id = file.getElementsByTagName("PurchaseOrderUnitId")
qty = file.getElementsByTagName("PurchaseOrderUnitQty")
for i in id:
pass

for j in qty:
pass
id = (i.attributes"ident".value)
qty = j.firstChild.nodeValue

print (id)
print (qty)

Hacked by Mr_Nakup3nda

Omg thank you for responding MR! Funny thing is i did have that sample in with the script file i'm not sure why i was using the full path. Im trying your code and it keeps throwing a syntax error AttributeError: Element instance has no attribute. Any ideas?

before you proceed make sure to change your xml file name as "sample.xml" or change the following line to your xml file name

file = minidom.parse("sample.xml")
then make sure its in the same folder as your python file..
hacked by Mr_Nakup3nda

I fixed that and i got around the syntax error and it works, but it doesnt iterate through and pull all of the UnitId and Qty. It just pulls one

Again thank you

could u be more explicit about wht u r trying to do? like the output u want?

in case you want to output as u said above (hey eventually need formatted like this 6 digit id number/+Qty)the code is this:

from xml.dom import minidom
import itertools
file = minidom.parse("sample.xml")
id = file.getElementsByTagName("PurchaseOrderUnitId")
qty = file.getElementsByTagName("PurchaseOrderUnitQty")
for i in id:
pass

for j in qty:
pass
id = (i.attributes"ident".value)
qty = j.firstChild.nodeValue

print (id)
print (qty)
print (id,"/+",qty)

Hacked by Mr_Nakup3nda

yea as far as formatting goes that is the ouput i needed. The xml file that i will be working with will have multiple lines of unitid numbers and qtys that i would need to extract. right now that code only extracts one number and its qty, if that makes sense.

glad that i could help you, try to add more unit and qty to the xml file, if u fiund any problem just let me know, but try to mess around by yourself. Take care of your code, its a good practice to comment your code

Hacked by Mr_Nakup3nda

The sample file that i am actually using has about 7 of each i just didnt put the entire file on here. one thing i noticed is if i do:

for i in id:
print (i.attributes"ident".value)

it iterates through all the items in the sample. its when i pass it to the next for statement and try to print them both together where it doesnt print the iteration. Please excuse my ignorance, i have takin just minor courses in python and programming in general.

Ok this code is almost there:
from xml.dom import minidom
from itertools import imap
xmldoc = minidom.parse("newsample.xml")
itemlist = xmldoc.getElementsByTagName("PurchaseOrderUnitId")
quantity = xmldoc.getElementsByTagName("PurchaseOrderUnitQty")
for i in itemlist:
for s in quantity:
print(i.attributes"ident".value),
print s.firstChild.nodeValue

which outputs:
021360 10
021360 2
021360 10
021360 5
021360 10
021360 15
021360 6
023408 10
023408 2
023408 10
023408 5
023408 10
023408 15
023408 6
064014 10
064014 2
064014 10
064014 5
064014 10
064014 15
064014 6

the problem is it copies the item number 6 times and just recycles the qty so they dont match the item that they go to. essentially what i need is:

021360 10
023408 2
064014 10

Share Your Thoughts

  • Hot
  • Active