Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享


为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?

我使用以下代码获取“InvalidOperationException> Message = Sequence不包含匹配元素”:

private void buttonLoadHTML_Click(object sender, EventArgs e) { GetParagraphsListFromHtml(@"C:PlatypiRUsfitt.html"); } // This code adapted from Kirk Woll's answer at https://stackoverflow.com/questions/4752840/html-agility-pack-c-sharp-paragraph- parsing-problem public List GetParagraphsListFromHtml(string sourceHtml) { var pars = new List(); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(sourceHtml); foreach (var par in doc.DocumentNode .DescendantNodes() .Single(x => x.Id == "body") .DescendantNodes() .Where(x => x.Name == "p")) //.Where(x => x.Name == "h1" || x.Name == "h2" || x.Name == "h3" || x.Name == "hp" || )) <-- This is what I'd really like to do, but I don't know if this is possible or, if it is, if the syntax is correct { pars.Add(par.InnerText); } // test foreach (string s in pars) { MessageBox.Show(s); } return pars; } 

为什么代码没有找到段落?

我真的想找到所有文本(h1..3或更高的值),但这是一个开始。

BTW:我正在测试的html文件确实有一些段落元素。

UPDATE

为了回应Amy的隐含请求,并且为了完全公开/终极照明,这里是整个测试html文件:

  body { background-color: orange; font-family: Verdana, sans-serif; } h1 { color: Blue; font-family: 'Segoe UI', Verdana, sans-serif; } h2 { color: white; font-family: 'Palatino Linotype', 'Palatino', sans-serif; } h3 { display: inline-block; }  

Found in the Translation

Bilingual Editions of Classic Literature

Around the World in 80 Days by Jules Verne (French & English Side by Side)

Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Gulliver's Travels by Jonathan Swift (English & French Side by Side)

Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Journey to the Center of the Earth by Jules Verne (French & English Side by Side)

Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Treasure Island by Robert Louis Stevenson (English & Finnish Side by Side)

Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Robinson Crusoe by Daniel Defoe (English & French Side by Side)

Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Don Quixote by Miguel de Cervantes Saavedra (Spanish & English Side by Side)

Paperback


Volume I

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Volume II

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Volume III

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle


Volume I

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Volume II

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Volume III

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Alice's Adventures in Wonderland by Lewis Carroll (English & German Side by Side)

Coming soon; for now, see:


Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Alice's Adventures in Wonderland by Lewis Carroll (English & Italian Side by Side)

Coming soon; for now, see:


Paperback

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Kindle

Csharp/C#教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享

Other Sites:

USA Map-O-Rama

Award-winning Movies, Books, and Music

Garrapata State Park in Big Sur Throughout the Seasons

更新2

这有效(虽然它是“实时”网页,而不是保存到磁盘的html文件):

 public List GetParagraphsListFromHtml(string sourceHtml) { var pars = new List(); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(sourceHtml); var getHtmlWeb = new HtmlWeb(); var document = getHtmlWeb.Load("https://www.montereycountyweekly.com/opinion/letters/article_e333a222-942d-11e3-ba9c-001a4bcf6878.html"); //https://www.bigsurgarrapata.com/ only returned one paragraph // https://usamaporama.azurewebsites.net/ <-- none // https://www.awardwinnersonly.com/ <- same as bigsurgarrapata var pTags = document.DocumentNode.SelectNodes("//p"); int counter = 1; if (pTags != null) { foreach (var pTag in pTags) { pars.Add(pTag.InnerText); MessageBox.Show(pTag.InnerText); counter++; } } MessageBox.Show("done!"); return pars; } 

事实certificate这很简单; 这还不完整,但是这个受到这个答案的启发,足以让我们开始:

上述就是C#学习教程:为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?分享的全部内容,如果对大家有所用处且需要了解更多关于C#学习教程,希望大家多多关注—计算机技术网(www.ctvol.com)!

 HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); // There are various options, set as needed htmlDoc.OptionFixNestedTags = true; htmlDoc.Load(@"C:Platypusdplatypus.htm"); if (htmlDoc.DocumentNode != null) { IEnumerable textNodes = htmlDoc.DocumentNode.SelectNodes("//text()"); foreach (HtmlNode node in textNodes) { if (!string.IsNullOrWhiteSpace(node.InnerText)) { MessageBox.Show(node.InnerText); } } } 

本文来自网络收集,不代表计算机技术网立场,如涉及侵权请联系管理员删除。

ctvol管理联系方式QQ:251552304

本文章地址:https://www.ctvol.com/cdevelopment/1030706.html

(0)
上一篇 2022年1月13日
下一篇 2022年1月13日

精彩推荐