HtmlAgilityPack – 从html表中获取数据
我的程序使用HtmlAgilityPack并抓取HTML网页,将其存储在变量中,并且我试图从HTML两个表中获取特定Div类标记(boardcontainer)。 使用我当前的代码,它在整个网页中搜索每个表并显示它们,但是当一个单元格为空时它会抛出exception:
“NullReferenceException未处理 – 对象引用未设置为对象的实例。”
HTML的一小部分(在这种情况下,我在网站上搜索’Microsoft’:
Main Database Company Name 0870 / 0871 0844 / 0845 01 / 02 / 03 Freephone Other Information Microsoft 0870 601 0100 0844 800 2400 01954 713950 Customer Support
Straight to agent (no menu)
Also for 0870 6010200 Microsoft 0870 601 0100 0844 800 2400 0118 909 7800 Main UK Switchboard
Ask to be put through to required department
Also for 0870 6010200
这是我当前的代码,它只抓取表并显示行+单元格然后在Null上抛出exception。
string html = myRequest.GetResponse(); HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html); foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//table")) { Console.WriteLine("Found: " + table.Id); foreach (HtmlNode row in table.SelectNodes("tr")) { Console.WriteLine("row"); foreach (HtmlNode cell in row.SelectNodes("th|td")) //Exception is thrown here { Console.WriteLine("cell: " + cell.InnerText); } } }
如何更改此选项以搜索特定div类并从内部提取表?
谢谢你的阅读。
完整的HTML:
Main Database Company Name 0870 / 0871 0844 / 0845 01 / 02 / 03 Freephone Other Information Microsoft 0870 601 0100 0844 800 2400 01954 713950 Customer Support
Straight to agent (no menu)
Also for 0870 6010200 Microsoft 0870 601 0100 0844 800 2400 0118 909 7800 Main UK Switchboard
Ask to be put through to required department
Also for 0870 6010200 Microsoft 0870 601 0100 0844 800 2400 +35314502113 Customer Support
Answers as Microsoft Ireland with same options as UK 08 numbers
Reduce cost using 1899 (or similar)
Also for 0870 6010200 Microsoft 0870 241 1963 0844 800 2400 020 3147 4930 0800 0188354 Product Activation
Home & Business (Volume Licensing)
Also: 0800 018 8364 & +800 2284 8283
Also for 0870 6010100 & 0870 6010200 Microsoft 0870 241 1963 0800 9179016 Volume Licensing Microsoft 020 3027 6039 0800 7318457 Online Services Support
MSN, Hotmail, Live, Messenger etc
Also: 0800 587 2920 Microsoft 0870 607 0700 0844 800 6006 +35317065353 Ask Partner Hotline
Answers with same options
Reduce cost using 1899 (or similar) Microsoft 0870 607 0700 0844 800 6006 0800 9173128 Partner Network Regional Service Centre
Help with membership questions and tools, benefits and resource queries Microsoft 0870 601 0100 0844 800 2400 0800 0324479 Direct Services
Also for 0870 6010200 Microsoft 0870 601 0100 0844 800 2400 +35318831002 0800 0517215 MSDN (Microsoft Developers Network)
When calling +353 reduce cost using 1899 (or similar)
Also for 0870 6010200 Microsoft 0870 601 0100 0844 800 2400 +35318831002 0800 281221 Microsoft Technet
When calling +353 reduce cost using 1899 (or similar)
Also for 0870 6010200 Microsoft XBOX 020 7365 9792 0800 5871102 Customer Support
Website and Content © 1999-2011 SAYNOTO0870.COM. All Rights Reserved.
Written permission is required to duplicate any of the content within this site. _uacct = "UA-194609-1";urchinTracker();
以下XPATH允许您在HTML文档中搜索特定的DIV
(带有“boardcontainer”类):
//div[@class='boardcontainer']/table
要处理空行,只需检查返回的HtmlNodeCollection
是否为null
。
这是一个完整的例子:
HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html); foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//div[@class='boardcontainer']/table")) { Console.WriteLine("Found: " + table.Id); foreach (HtmlNode row in table.SelectNodes("tr")) { Console.WriteLine("row"); HtmlNodeCollection cells = row.SelectNodes("th|td"); if (cells == null) { continue; } foreach (HtmlNode cell in cells) { Console.WriteLine("cell: " + cell.InnerText); } } }
您还应该检查是否找到了一个表,以及找到的表是否包含行。
尝试:
foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//div[@class='boardcontainer']/table"))
它是与属性匹配的XPath表达式。 有关详情,请参阅此处:
https://www.exampledepot.com/egs/org.w3c.dom/xpath_getelembyattr.html
上述就是C#学习教程:HtmlAgilityPack – 从html表中获取数据分享的全部内容,如果对大家有所用处且需要了解更多关于C#学习教程,希望大家多多关注—计算机技术网(www.ctvol.com)!
本文来自网络收集,不代表计算机技术网立场,如涉及侵权请联系管理员删除。
ctvol管理联系方式QQ:251552304
本文章地址:https://www.ctvol.com/cdevelopment/954220.html
赞 (0)
Csharp/C#教程:我可以使用VS2010 PrivateObject来访问静态类中的静态字段吗?分享
上一篇
2021年11月20日
Csharp/C#教程:在C#中生成XML文档哈希分享
下一篇
2021年11月20日