2017年8月8日 星期二

[ Python 常見問題 ] BeautifulSoup - How to remove all tags from an element?

Source From Here 
Question 
How can I simply strip all tags from an element I find in BeautifulSoup? 

How-To 
Use get_text(), it returns all the text in a document or beneath a tag, as a single Unicode string: 
  1. html_doc = """  
  2. <html><head><title>The Dormouse's story</title></head>  
  3. <body>  
  4. <p class="title"><b>The Dormouse's story</b></p>  
  5.   
  6. <p class="story">Once upon a time there were three little sisters; and their names were  
  7. <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,  
  8. <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and  
  9. <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;  
  10. and they lived at the bottom of a well.</p>  
  11.   
  12. <p class="story">...</p>  
  13. """  
  14.   
  15. from bs4 import BeautifulSoup  
  16. soup = BeautifulSoup(html_doc, 'html.parser')  
  17. print("{}".format(soup.get_text()))  
Execution output: 
The Dormouse's story 

The Dormouse's story 
Once upon a time there were three little sisters; and their names were 
Elsie, 
Lacie and 
Tillie; 
and they lived at the bottom of a well. 
...

This message was edited 3 times. Last update was at 09/08/2017 09:36:14

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...