A Cocoon 3 pipeline expects one or more component(s). These components get linked with each other in the order they were added. There is no restriction on the content that flows through the pipeline.
A pipeline works based on two fundamental concepts:
The first component of a pipeline is of type
org.apache.cocoon.pipeline.component.Starter
.
The last component is of type org.apache.cocoon.pipeline.component.Finisher
.
In order to link components with each other, the first has to be a
org.apache.cocoon.pipeline.component.Producer
, the latter
org.apache.cocoon.pipeline.component.Consumer
.
When the pipeline links the components, it merely checks whether the above mentioned interfaces are present. So the
pipeline does not know about the specific capabilities or the compatibility of the components. It is the
responsibility of the Producer
to decide whether a specific Consumer
can be linked to it or not (that is, whether it can produce output in the desired format of the Consumer
or not). It is also conceivable that a Producer
is capable of accepting different types of
Consumer
and adjust the output format
A Cocoon 3 pipeline always goes through the same sequence of components to produce its output. There is no support for conditionals, loops, tees or alternative flows in the case of errors. The reason for this restriction is simplicity and that non-linear pipelines are more difficult (or even impossible) to be cached. In practice this means that a pipeline has to be contructed completely at build-time.
If non-linear XML pipes with runtime-support for conditionals, loops, tees and error-flows are a requirement for you, see the XProc standard of the W3C. There are several available implementations for it.
But let's get more specific by giving an example: Cocoon has become famous for its SAX pipelines that consist
of exactly one SAX-based XML generator, zero, one or more SAX-based XML transformers and exactly one SAX-based
XML serializer. Of course, these specific SAX-based XML pipelines can be build by using general
Cocoon 3 pipelines: generators, transformers and serializers are pipeline components. A generator is a
Starter
and a Producer
, a transformer can't be neither a
Starter
, nor a Finisher
but is always a Producer
and a Consumer
and a serializer is a Consumer
and a Finisher
.
Here is some Java code that demonstrates how a pipeline can be utilized with SAX-based XML components:
Pipeline<SAXPipelineComponent> pipeline = new NonCachingPipeline<SAXPipelineComponent>(); pipeline.addComponent(new XMLGenerator("<x></x>")); pipeline.addComponent(new XSLTTransformer(this.getClass().getResource("/test1.xslt"))); pipeline.addComponent(new XSLTTransformer(this.getClass().getResource("/test2.xslt"))); pipeline.addComponent(new XMLSerializer()); pipeline.setup(System.out); pipeline.execute();
Create a | |
Add a generator, that implements the
The
The
Since a generator is the first component of a pipeline, it also has to implement the | |
Add a transformer, that implements the
This
Since it implements the
This transformer also implements the | |
Add another transformer to the pipeline. A pipeline can contain any number of components that implement the | |
Add a serializer, that implements the
The XML serializer receives SAX events and serializes them into an
A serializer component is the last component of a pipeline and hence it has to implement the
Since it receives SAX events, it implements the | |
A pipeline has to be initialized first by calling its | |
After the pipeline has been initialized, it can be executed by invoking its Once the pipeline has been started, it either succeeds or fails. There is no way to react on any (error) conditions. |
Table 2.1. SAX components and their interfaces
Component type | Structural interfaces | Content-specific interfaces | ||
---|---|---|---|---|
SAX generator | Starter, Producer, PipelineComponent | SAXProducer | ||
SAX transformer | Producer, Consumer, PipelineComponent | SAXProducer, SAXConsumer | ||
SAX serializer | Finisher, Consumer, PipelineComponent | SAXConsumer |
TBW: noncaching, caching, async-caching, expires caching, own implementations
TBW: Passing parameters to the pipeline and its components, finsih() method
concept, writing custom SAX components, link to Javadocs
explain from a user's point of view, what she needs to do to implement one (available abstract classes)
explain from a user's point of view, what she needs to do to implement one
buffering
StAX pipelines provide an alternative API for writing pipeline components. Altough they are not as fast as SAX, they provide easier state handling as the component can control when to pull the next events. This allows an implicit state rather than have to manage the state in the various content handler methods of SAX.
The most visible difference of StAX components in contrast to SAX is that the component itself has controls the parsing of the input whereas in SAX the parser controls the pipeline by calling the component. Our implementation of StAX pipelines uses just StAX interfaces for retrieving events - the writing interface is proprietary in order to avoid multihreading or continuations. So it is really a hybrid process - the StAX component is called to generate the next events, but it is also allowed to read as much data from the previous pipeline component as it wants. But as the produced events are kept in-memory until a later component pulls for them, the components should not emit large amounts of events during one invocation.
StAXGenerator
is a Starter and normally parses a XML from an InputStream.
StAXSerializer
is a Finisher and writes the StAX Events to an OutputStream.
AbstractStAXTransformer
is the abstract base class for new transformers. It simplifies the task by providing a template method for generating the new events.
StAXCleaningTransformer
is an transformer, which cleans the document from whitespaces and comments.
IncludeTransformer
includes the contents of another document.
For further information refer to the javadoc
The StAXGenerator
is a Starter
component and produces XMLEvents.
import java.io.InputStream; import java.net.URL; import javax.xml.stream.FactoryConfigurationError; import javax.xml.stream.XMLEventReader; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamException; import javax.xml.stream.events.XMLEvent; import org.apache.cocoon.pipeline.SetupException; import org.apache.cocoon.pipeline.component.Starter; public class MyStAXGenerator extends AbstractStAXProducer implements Starter { private XMLEventReader reader; public MyStAXGenerator(InputStream inputStream) { try { this.reader = XMLInputFactory.newInstance().createXMLEventReader(inputStream); } catch (XMLStreamException e) { throw new SetupException("Error during setup an XMLEventReader on the inputStream", e); } catch (FactoryConfigurationError e) { throw new SetupException("Error during setup the XMLInputFactory for creating an XMLEventReader", e); } } public void execute() { this.getConsumer().initiatePullProcessing(); } public boolean hasNext() { return this.reader.hasNext(); } public XMLEvent nextEvent() throws XMLStreamException { return this.reader.nextEvent(); } public XMLEvent peek() throws XMLStreamException { return this.reader.peek(); } }
In order to implement an own | |
The constructor creates a new XMLEventReader for reading from the inputstream. | |
The pipeline is started using the | |
This method should return true if the generator has a next Event. | |
Returns the next event from the generator. | |
Returns the next event from the generator, without moving actually to the next event. |
Implementing a StAX Transformer should be the most common use case. The AbstractStAXTransformer
provides a foundation for new transformers. But in order to write new transformers even simpler, let's describe another feature first:
Navigators allow an easier navigation in the XML document. They also simplify transformers, as usually transformers need only process some parts of the input document and the navigator helps to identify the interesting parts. There are several implementations already included:
FindStartElementNavigator
finds the start tag with certain properties(name,attribute)
FindEndElementNavigator
finds the end tag with certain properties(name,attribute)
FindCorrespondingStartEndElementPairNavigator
finds both the start and the corresponding end tag.
InSubtreeNavigator
finds whole subtrees, by specifying the properties of the "root" element.
For further information refer to the navigator javadoc
Using a navigator is a rather simple task. The transformer peeks or gets the next event and calls Navigator.fulfillsCriteria
- if true is returned the transformer should be process that event somehow.
Creating a new navigator is a rather simple task and just means implementing two methods:
import javax.xml.stream.events.XMLEvent; public class MyNavigator implements Navigator { public boolean fulfillsCriteria(XMLEvent event) { return false; } public boolean isActive() { return false; } }
This method returns true if the event matches the criteria of the navigator. | |
Returns the result of the last invocation of fulfillsCriteria. |
The next example should show you an transformer featuring navigators and implicit state handling through function calls.
public class DaisyLinkRewriteTransformer extends AbstractStAXTransformer { @Override protected void produceEvents() throws XMLStreamException { while (this.getParent().hasNext()) { XMLEvent event = this.getParent().nextEvent(); if (this.anchorNavigator.fulfillsCriteria(event)) { ArrayList<XMLEvent> innerContent = new ArrayList<XMLEvent>(); LinkInfo linkInfo = this.collectLinkInfo(innerContent); if(linkInfo != null) { linkInfo.setNavigationPath(this.getAttributeValue(event.asStartElement(), PUBLISHER_NS,"navigationPath")); this.rewriteAttributesAndEmitEvent(event.asStartElement(),linkInfo); if(innerContent.size() != 0) { this.addAllEventsToQueue(innerContent); } } /* ... */ } /* ... */ } } private LinkInfo collectLinkInfo(List<XMLEvent> events) throws XMLStreamException { Navigator linkInfoNavigator = new InSubtreeNavigator(LINK_INFO_EL); Navigator linkInfoPartNavigator = new FindStartElementNavigator(LINK_PART_INFO_EL); LinkInfo linkInfo = null; while (this.getParent().hasNext()) { XMLEvent event = this.getParent().peek(); if (linkInfoNavigator.fulfillsCriteria(event)) { event = this.getParent().nextEvent(); if (linkInfoPartNavigator.fulfillsCriteria(event)) { /* ... */ String fileName = this.getAttributeValue(event.asStartElement(),"fileName"); if (!"".equals(fileName)) { linkInfo.setFileName(fileName); } } /* ... */ } else if (event.isCharacters()) { events.add(this.getParent().nextEvent()); } else { return linkInfo; } } return linkInfo; } private void rewriteAttributesAndEmitEvent(StartElement event, LinkInfo linkInfo) ; }
The transformer checks for anchors in the XML. | |
If an anchor is found, it invokes a method which parses the link info if there is any. The additional array is for returning any events, which were read but do not belong to the linkinfo. | |
This method finally writes the start tag with the correct attributes taken from the parsed LinkInfo. | |
The events, which were read but not parsed, are finally added to the output of the transformer. | |
The parser for the linkInfo object uses itself also navigators ... | |
... and reads more events from the parent. |
The StAXSerializer
pulls and serializes the StAX events from the pipeline.
public class NullSerializer extends AbstractStAXPipelineComponent implements StAXConsumer, Finisher { private StAXProducer parent; public void initiatePullProcessing() { try { while (this.parent.hasNext()) { XMLEvent event = this.parent.nextEvent(); /* serialize Event */ } } catch (XMLStreamException e) { throw new ProcessingException("Error during writing output elements.", e); } } public void setParent(StAXProducer parent) { this.parent = parent; } public String getContentType() ; public void setOutputStream(OutputStream outputStream) ; }
The Finisher has to pull from the previous pipeline component.. | |
In case of StAX the last pipeline component has to start pulling for Events. | |
The serializer pulls the next Event from the previous component and should as next step serialize it. | |
During the pipeline construction the setParent is called to set the previous component of the pipeline. | |
These two methods are defined in the Finisher and allow to set the OutputStream (if the Serializer needs any) and to retrieve the content-type of the result.. |
The StAX pipeline offers interoperability to SAX components to a certain degree. However due their different paradigms only two use cases are currently implemented: Wrapping a SAX component in a StAX pipeline and a StAX-to-SAX pipeline, which starts with StAX components and finishes with SAX.
This allows to use existing SAX components in a StAX pipeline. Beware the overhead of the conversion of StAX->SAX->StAX - so no performance gains from a SAX component can be expected.
Pipeline<StAXPipelineComponent> pipeStAX = new NonCachingPipeline<StAXPipelineComponent>(); pipeStAX.addComponent(new StAXGenerator(input)); pipeStAX.addComponent(new SAXForStAXPipelineWrapper(new CleaningTransformer())); pipeStAX.addComponent(new StAXSerializer()); pipeStAX.setup(System.out); pipeStAX.execute();
The pipeline uses a | |
In order to embed a single SAX component in a StAX pipeline, the | |
Altough the |
This converter allows to mix StAX and SAX components - but is limited to starting with StAX and then switching to SAX.
Pipeline<PipelineComponent> pipeStAX = new NonCachingPipeline<StAXPipelineComponent>(); pipeStAX.addComponent(new StAXGenerator(input)); pipeStAX.addComponent(new StAXToSAXPipelineAdapter()); pipeStAX.addComponent(new CleaningTransformer()); pipeStAX.addComponent(new XMLSerializer()); pipeStAX.setup(System.out); pipeStAX.execute();
The pipeline starts with a | |
The adapter converts the StAX events to SAX method calls. | |
The | |
The |
In order to use StAX with Java 1.5 an additional dependency is needed in the project's pom.xml
.
<dependency> <groupId>org.codehaus.woodstox</groupId> <artifactId>wstx-asl</artifactId> <version>3.2.7</version> </dependency>
Using woodstox is simpler, as the reference implementation depends on JAXP 1.4, which is not part of Java 1.5.