Developers often need to parse an XML document in order to get XML document nodes, child nodes and so on. But usually we have no idea how far the depth/nesting of child nodes can go.
So we need some form of recursive technique to loop the nested nodes and help us grab the values we want.
Here, I describe two methods of doing this.
- The first method uses the traditional Document Object Model (DOM).
- The second method uses LINQ to XML.
Although LINQ to XML is newer and more powerful, I find that in some particular scenarios like the one described here, traditional DOM might actually do the job “better” (or at least easier).
You will notice that I don’t emit the attribute names and values with the LINQ to XML method.
Sample XML Document
In both of the techniques described, we shall be working with the sample XML document below. I grabbed this document from Microsoft here. I then “sanitized” it by replacing all double quotes with single quotes to work with my code examples.
<?xml version='1.0'?> <catalog> <book id='bk101'> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <book id='bk102'> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> </book> <book id='bk103'> <author>Corets, Eva</author> <title>Maeve Ascendant</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-11-17</publish_date> <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description> </book> <book id='bk104'> <author>Corets, Eva</author> <title>Oberon's Legacy</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2001-03-10</publish_date> <description>In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.</description> </book> <book id='bk105'> <author>Corets, Eva</author> <title>The Sundered Grail</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2001-09-10</publish_date> <description>The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.</description> </book> <book id='bk106'> <author>Randall, Cynthia</author> <title>Lover Birds</title> <genre>Romance</genre> <price>4.95</price> <publish_date>2000-09-02</publish_date> <description>When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled.</description> </book> <book id='bk107'> <author>Thurman, Paula</author> <title>Splish Splash</title> <genre>Romance</genre> <price>4.95</price> <publish_date>2000-11-02</publish_date> <description>A deep sea diver finds true love twenty thousand leagues beneath the sea.</description> </book> <book id='bk108'> <author>Knorr, Stefan</author> <title>Creepy Crawlies</title> <genre>Horror</genre> <price>4.95</price> <publish_date>2000-12-06</publish_date> <description>An anthology of horror stories about roaches, centipedes, scorpions and other insects.</description> </book> <book id='bk109'> <author>Kress, Peter</author> <title>Paradox Lost</title> <genre>Science Fiction</genre> <price>6.95</price> <publish_date>2000-11-02</publish_date> <description>After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum.</description> </book> <book id='bk110'> <author>O'Brien, Tim</author> <title>Microsoft .NET: The Programming Bible</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2000-12-09</publish_date> <description>Microsoft's .NET initiative is explored in detail in this deep programmer's reference.</description> </book> <book id='bk111'> <author>O'Brien, Tim</author> <title>MSXML3: A Comprehensive Guide</title> <genre>Computer</genre> <price>36.95</price> <publish_date>2000-12-01</publish_date> <description>The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.</description> </book> <book id='bk112'> <author>Galos, Mike</author> <title>Visual Studio 7: A Comprehensive Guide</title> <genre>Computer</genre> <price>49.95</price> <publish_date>2001-04-16</publish_date> <description>Microsoft Visual Studio 7 is explored in depth, looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment.</description> </book> </catalog>
Get XML Document Nodes Using DOM
Here’s how I loop through the entire XML document. I print out the element and attribute names and values with the loop. The “XmlDocument” object belongs to the traditional/classic Document Object Model (DOM).
Note that the “DisplayNodes” method is recursive.
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Xml; namespace GetXMLNodes { class Program { static void Main(string[] args) { string str = @"copy and paste the sample xml here"; XmlDocument doc = new XmlDocument(); doc.LoadXml(str); XmlNode rootNode = doc.DocumentElement; DisplayNodes(rootNode); Console.ReadLine(); } private static void DisplayNodes(XmlNode node) { //Print the node type, node name and node value of the node if (node.NodeType == XmlNodeType.Text) { Console.WriteLine("Type = [" + node.NodeType + "] Value = " + node.Value); } else { Console.WriteLine("Type = [" + node.NodeType + "] Name = " + node.Name); } //Print attributes of the node if (node.Attributes != null) { XmlAttributeCollection attrs = node.Attributes; foreach (XmlAttribute attr in attrs) { Console.WriteLine("Attribute Name = " + attr.Name + "; Attribute Value = " + attr.Value); } } //Print individual children of the node, gets only direct children of the node XmlNodeList children = node.ChildNodes; foreach (XmlNode child in children) { DisplayNodes(child); } } } }
The output of the above code looks like this:
Get XML Document Nodes With LINQ To XML
LINQ to XML provides loads of features for manipulating XML documents. The tiny example here is just a tip of the iceberg. LINQ to XML works with the “XDocument” instead of the “XmlDocument”.
I’m not using a recursive function here (just a foreach loop instead) because the “DescendantNodes()” method gives us the data we want recursively. However, it does not include attributes as nodes.
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Xml; using System.Xml.Linq; namespace GetXMLNodes { class Program { static void Main(string[] args) { string str = @"copy and paste the sample xml here"; XDocument doc = XDocument.Parse(str); var col = from dummy in doc.DescendantNodes() select dummy; foreach (var myvar in col) { XNode node = (XNode)myvar; if (node.NodeType == XmlNodeType.Text) { Console.WriteLine("Type = [" + node.NodeType + "] Value = " + node.ToString()); } else { XElement xdoc = new XElement((node as XElement).Name, (node as XElement).Value); Console.WriteLine("Type = [" + xdoc.NodeType + "] Name = " + xdoc.Name); } } Console.ReadLine(); } } }
Here’s how the output of the above code looks like:
Notice that attribute names and values are missing. If you want to filter by attribute, you would need to specifically add the attribute filter to the query and then return the matching elements.
The relevant part of your code will look something like:
var matchingElements = doc.Descendants() .Where(x => x.Attribute("foo") != null);
Check here for more exmaples of c# connecting to XML
http://csharp.net-informations.com/xml/csharp-xmltutorial.htm
C# xml tutorial
evan