Csharp/C#教程:使用HtmlAgilityPack解析dl分享


使用HtmlAgilityPack解析dl

这是我尝试使用ASP.Net(C#)中的Html Agility Pack解析的示例HTML。

1
First Entry
2
Second Entry
3
Third Entry

我想要的价值观是:

(我已经在这里采用了第一个条目的示例,但我想要列表中所有条目的这些元素的值)

这是我目前使用的代码,

 var webGet = new HtmlWeb(); var document = webGet.Load(url2); var parsedValues= from info in document.DocumentNode.SelectNodes("//div[@class='content-div']") from content in info.SelectNodes("dl//dd") from link in info.SelectNodes("dl//dt/b/a") .Where(x => x.Attributes.Contains("href")) select new { Text = content.InnerText, Url = link.Attributes["href"].Value, AnchorText = link.InnerText, }; GridView1.DataSource = parsedValues; GridView1.DataBind(); 

问题是我正确获取了链接和锚文本的值,但是对于它的内部文本,它只取第一个条目的值,并为元素出现的总次数填充所有其他条目的相同值。然后它从第二个开始。 在我的解释中,我可能不太清楚,所以这是我用这段代码得到的示例输出:

 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 

而我想要得到

 First Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/1.html 1 Second Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/2.html 2 Third Entry https://stackoverflow.com/questions/8942595/parsing-dl-with-htmlagilitypack/3.html 3 

我对HAP很新,对xpath知之甚少,所以我确信我在这里做错了,但即使花了好几个小时,我仍然无法工作。 任何帮助将非常感激。

解决方案1

我已经定义了一个函数,给定一个dt节点将返回它之后的下一个dd节点:

 private static HtmlNode GetNextDDSibling(HtmlNode dtElement) { var currentNode = dtElement; while (currentNode != null) { currentNode = currentNode.NextSibling; if(currentNode.NodeType == HtmlNodeType.Element && currentNode.Name =="dd") return currentNode; } return null; } 

现在LINQ代码可以转换为:

 var parsedValues = from info in document.DocumentNode.SelectNodes("//div[@class='content-div']") from dtElement in info.SelectNodes("dl/dt") let link = dtElement.SelectSingleNode("b/a[@href]") let ddElement = GetNextDDSibling(dtElement) where link != null && ddElement != null select new { Text = ddElement.InnerHtml, Url = link.GetAttributeValue("href", ""), AnchorText = link.InnerText }; 

解决方案2

没有其他function:

 var infoNode = document.DocumentNode.SelectSingleNode("//div[@class='content-div']"); var dts = infoNode.SelectNodes("dl/dt"); var dds = infoNode.SelectNodes("dl/dd"); var parsedValues = dts.Zip(dds, (dt, dd) => new { Text = dd.InnerHtml, Url = dt.SelectSingleNode("b/a[@href]").GetAttributeValue("href", ""), AnchorText = dt.SelectSingleNode("b/a[@href]").InnerText }); 

例如,你如何使用Html Agility Pack解析一些元素

上述就是C#学习教程:使用HtmlAgilityPack解析dl分享的全部内容,如果对大家有所用处且需要了解更多关于C#学习教程,希望大家多多关注—计算机技术网(www.ctvol.com)!

 public string ParseHtml() { string output = null; HtmlDocument htmldocument = new HtmlDocument(); htmldocument.LoadHtml(YourHTML); HtmlNode node = htmldocument.DocumentNode; HtmlNodeCollection dds = node.SelectNodes("//dd"); //Select all dd tags HtmlNodeCollection anchors = node.SelectNodes("//b/a[@href]"); //Select all 'a' tags that contais href attribute for (int i = 0; i < dds.Count; i++) { string atributteValue = null. Text = dds[i].InnerText; Url = anchors[i].GetAttributeValue("href", atributteValue); AnchorText = anchors[i].InnerText; //Your code... } return output; } 

本文来自网络收集,不代表计算机技术网立场,如涉及侵权请联系管理员删除。

ctvol管理联系方式QQ:251552304

本文章地址:https://www.ctvol.com/cdevelopment/1008864.html

(0)
上一篇 2021年12月29日
下一篇 2021年12月29日

精彩推荐