XML

Basic

  • XML:eXtensible Markup Language
  • XML is a syntax (serialization format) for data sharing and exchange on the Web
    • Can translate any data to XML
    • Can ship XML over the Web (HTTP)
    • Can input XML into any application

XML Structure

  • XML is self-describing
  • Schema elements become part of the data(关系型数据记录中schema不算是内容,但是在xml中节点自身也携带内容信息)
  • XML is semi-structured
    • missing attributes (could be represented as null in table)
    • repeated attributes (impossible in table)
    • Attributes with different types in different objects
    • Nested structures
    • Heterogeneous contents
  • an XML document has a single root element
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
example:
<bib>
...
<book price="35">
<publisher>Addison-Wesley</publisher>
<author>Serge Abiteboul</author>
<author>
<first-name>Rick</first-name>
<last-name>Hull</last-name>
</author>
<author age="20">Victor Vianu</author>
<title>Foundations of Databases</title>
<year>1995</year>
<price>38.8</price>
</book>
<book price="55">
<publisher>Freeman</publisher>
<author>Jeffrey D. Ullman</author>
<title>Principles of Database and Knowledge Base Systems</title>
<year>1998</year>
</book>
...
</bib>

Querying XML Data

XPath: simple navigation through the tree

lxml

1
2
3
4
5
6
7
8
9
10
11
$ pip install lxml

from lxml import etree
f = open('bibs.xml')
tree = etree.parse(f)
print(etree.tostring(tree, pretty_print=True))

for element in tree.xpath("//author"):
print(etree.tostring(element))
print(element.tag, element.text)


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!