你也许意识到问题了:我去,这一下子把XML Load到内存里,你是不是疯了?为了避免疯掉,你可以用Satx啊,那玩意儿不是流式的么?给个栗子,读取这样的XML文件片段:
<OutXMLData>
<SubXMLItemType>
....
</SubXMLItemType>
<SubXMLItemType>
....
</SubXMLItemType>
<SubXMLItemType>
....
</SubXMLItemType>
....
</OutXMLData>
private static void readWriteWithStAXAndJAXB() throws FactoryConfigurationError, FileNotFoundException, XMLStreamException, UnsupportedEncodingException, JAXBException,
PropertyException {
// set up a StAX reader
XMLInputFactory xmlif = XMLInputFactory.newInstance();
BufferedInputStream bis = new BufferedInputStream(new FileInputStream("inputLarge.xml"));
XMLStreamReader xmlr = xmlif.createXMLStreamReader(bis);
File outfile = new File("output\outfile.xml");
OutputStreamWriter bos = new OutputStreamWriter(new FileOutputStream(outfile), "UTF-8");
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLStreamWriter xmlw = xmlof.createXMLStreamWriter(bos);
xmlw.writeStartDocument("UTF-8", "1.0");
xmlw.writeStartElement("OutXMLData");
JAXBContext ucontext = JAXBContext.newInstance(SubXMLItemType.class);
Unmarshaller unmarshaller = ucontext.createUnmarshaller();
Marshaller marshaller = ucontext.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
xmlr.nextTag();
xmlr.require(XMLStreamConstants.START_ELEMENT, null, "OutXMLData");
xmlr.nextTag();
int iCount = 0;
while (xmlr.getEventType() == XMLStreamConstants.START_ELEMENT) { // 按标签流式读取
iCount++;
JAXBElement<SubXMLItemType> pt = unmarshaller.unmarshal(xmlr, SubXMLItemType.class); // 只读取映射SubItem的内容
marshaller.marshal(pt, xmlw); // 这步是分批流式写入
xmlw.writeCharacters(" ");
if (xmlr.getEventType() == XMLStreamConstants.CHARACTERS) {
xmlr.next();
}
}
xmlw.flush();
xmlw.writeEndElement();
xmlw.writeEndDocument();
System.out.println("Entity Count is :" + iCount);
xmlr.close();
xmlw.close();
}
  说完这么多,基本上用Java处理XML已经不是难事了。不过,有时候你会有:给你蟹八件儿,你也无从下嘴的感受。比如,解析XML你可以掌控,随便你用啥,可是你调用的下游程序接口却需要另外一种格式的数据。比如,你用Stax解析XML,下游要DOM接口会不会令你抓狂起来?心里咒骂,倒霉玩意儿,你们还有没有点上进心?!近我遇到这事了,解析一个大的XML,下游要Sub的XML,或者叫XML片段,或者叫分割XML文件。好么,我把数据都拆成Java的Object,然后再给你拼成一个个小的XML文件发过去,丧心病狂么这不?!你如果真这么做了,别往下看了,你会哭的!
  Java 的XML包下面有个transform的子包,看看里面会有惊喜的。可以用这个工具包帮你完成类似的转换,比如Stax 和 Sax 或者Dom 互相的变换。或者变换成Stream。