Moritz Eysholdt, itemis AG
Serializing EMF
models with Xtext
Xtextcon 2014!
Kiel, Germany
Agenda
❖ Parsing vs. Serializing!
❖ Use Cases!
❖ Generating vs. Serializing!
❖ Challenges!
❖ The Contract!
❖ Architecture!
❖ Hooks!
❖ Advice
The New and the Old Serializer
org.eclipse.xtext.serializer	
•The New Serializer. This presentation is about it.
org.eclipse.xtext.parsetree.reconstr	
•The Old Serializer. Don’t used it.!
•Cryptic Error Messages!
•Bad Performance for Large Models!
•marked @Deprecated
Parser vs. Serializer
XtextResource
ModelModelTextual!
Model
Parser
Serializer
load()
save()
Parser vs. Serializer
XtextResource
ModelModelTextual!
Model
AST
Parser
Serializer
load()
save()
Use Cases
Quickfix
Refactoring
Non-Textual !
Editors
Non-Textual!
Persistence
Generators
Use Cases
Quickfix
Refactoring
Non-Textual !
Editors
Non-Textual!
Persistence
Generators
Use Cases
Quickfix
Non-Textual !
Editors
Non-Textual!
Persistence
Generators
Refactoring
Use Cases
Quickfix
Non-Textual!
Persistence
Generators
Refactoring
Non-Textual !
Editors
EMF Store
<?XML?>
<?XMI?>
Use Cases
Quickfix
Non-Textual!
Persistence
Generators
Refactoring
Non-Textual !
Editors
Read
Transform
Serialize
Generate, Transpile, Compile, Migrate…
Why Serialize?
❖ object model usually easier to modify than
its textual representation!
❖ guaranteed syntactical correctness!
❖ automatic handling of comments!
❖ automatic handling of whitespace
…when I have Xtend, THE language for generators
When not to Serialize
❖ Avoid serializing models that are broken due
to parse errors.!
❖ Template languages are simpler when
writing out large text chunks that never
change.
The hard Contract
Parsing a serialized Model must result !
in a model equal to the original.
Model original = createModel()	
String serialized = serialize(original)	
Model parsed = parse(serialized)	
assertEqual(original, parsed)
…enables textual models as persistence format
The soft Contract
Serializing a model that has been modified after parsing!
should only change the smallest number of characters necessary.
…keep diffs small
String originalDoc = loadDocument()	
Model parsed = parse(originalDoc)	
Model modified = applyModifications(parsed)	
String newDoc = serialize(modified)	
int numberOfChars = diff(originalDoc, newDoc).size()
1. Keep textual diffs small.!
2. Strictly comply with the semantic model!
3. Loosely comply with the node model
Challenges
❖ unassigned elements!
❖ ambiguous mapping from EStructuralFeatures to Assignments!
❖ ambiguous mapping from EClasses to ParserRules and Actions
…because the AST is actually abstract, as in “lack of information”
API Frontends #1
package org.eclipse.xtext.resource;	
!
public class XtextResource extends ResourceImpl {	
!
	 (…)	
!
	 public void save(Map<?, ?> options) 	
	 	 throws IOException {}	
!
	 public final void save(OutputStream outputStream, Map<?, ?> options) 	
	 	 throws IOException {}	
}
API Frontends #2
package org.eclipse.xtext.serializer;	
!
@ImplementedBy(Serializer.class)	
public interface ISerializer {	
	 public String serialize(EObject obj);	
	 public String serialize(EObject obj, SaveOptions options);	
	 public void serialize(EObject obj, Writer writer, SaveOptions options)	
	 	 throws IOException;	
	 public ReplaceRegion serializeReplacement(EObject obj, SaveOptions options);	
}
Options:!
- format: !
- true: format all!
- false: format regions without whitespace information!
- validate: don’t use, it’s an old algorithm.
Before Serialization
Create one State Machine per Context per EClass
Root:	
	 "optional"? ID children+=(List | Path);	
!
List returns Child:	
	 "list" item+=ID ("," item+=ID)*;	
!
Path returns Child:	
	 "path" seg=ID ({Segment.parent=current} seg=ID)+;
Context EClass
Root Root
List Child
Path Segment
Path_Segment_2_0 Child
Path_Segment_2_0 Segment
During Serialization #1
Find right state machines and !
find right path through them
Root:	
"optional"? ID children+=(List | Path);	
!
List returns Child:	
"list" item+=ID ("," item+=ID)*;
Context EClass
Root Root
Context EClass
List Child
optional Foo list a, b, cOutput:
Architecture #3
Context EClass
Root Root
Context EClass
List Child
Context EClass
Root Root
Context EClass
List Child
Semantic!
assigned grammar elements
Syntactic!
unassigned grammar elements
optional Foo list a, b, cOutput:
Architecture #4: Observer
Semantic!
Sequencer
Syntactic!
Sequencer
HiddenToken!
Sequencer
Formatter Writer
Events
Listens To
assigend!
* RuleCalls!
* Terminals!
* DataTypes!
* Keywords!
* CrossRefs
unassigned!
* RuleCalls!
* Terminals!
* DataTypes!
* Keywords
* whitespace!
* comments
modifies!
whitespace
writes to !
stream
Output: /*X*/ Foo list a
Serializer HiddenTokenSequencerSyntacticSequencerSemanticSequencer
createSequence
(Root, Root)
enter
AssignedParserRuleCall
(children=List)
enter
AssignedParserRuleCall
(children=List)
createSequence
(List, Children)
accept
UnassignedRuleCall
(ID)
accept
AssignedTerminalRuleCall
(itemi=ID)
accept
Keyword
("list")
accept
AssignedTerminalRuleCall
(itemi=ID)
leave
AssignedParserRuleCall
(children=List)
leave
AssignedParserRuleCall
(children=List)
Formatter Writer
enterAssignedParserRuleCall()
acceptWhitespace()
acceptWhitespace()
acceptComment(/*X*/)
acceptUnassignedRuleCall()
acceptKeyword()
acceptWhitespace()
acceptAssignedTerminalRuleCall()
acceptWhitespace()
enterAssignedParserRuleCall()
Root:	
"optional"? ID children+=(List | Path);
List returns Child:	
"list" item+=ID ("," item+=ID)*;
SerializerFragment
fragment = serializer.SerializerFragment auto-inject {	
generateStub = true	
// generateDebugData = true	
}
•SerializerFragment not required to use serializer!!
•Only purpose of SerializerFragment is to generate convenience API!
•generateDebugData to generate pretty state machine diagrams (graphviz dot)
Hooks #1: ITransientValueService
•selectively exclude model-objects and -values from serialization!
•transient == NOT serialized!
•Default setting from Ecore model: EStructuralFeature.isTransient()
public interface ITransientValueService {	
!
	 enum ListTransient {	
	 	 NO, SOME, YES	
	 }	
!
	 enum ValueTransient {	
	 	 NO, PREFERABLY, YES	
	 }	
!
	 public ListTransient isListTransient(EObject semanticObject, EStructuralFeature feature);	
!
	 public boolean isValueInListTransient(EObject semanticObject, int index, EStructuralFeature feature);	
!
	 public ValueTransient isValueTransient(EObject semanticObject, EStructuralFeature feature);	
}
Hooks #2: Token Serialization
public interface ICrossReferenceSerializer {	
	 boolean isValid(EObject context, CrossReference crossref, EObject target, INode node);	
	 String serializeCrossRef(EObject context, CrossReference crossref, EObject target, INode node);	
}
public interface IKeywordSerializer {	
	 boolean isValid(EObject context, Keyword keyword, Object value);	
	 String serializeAssignedKeyword(EObject context, Keyword keyword, Object value, INode node);	
}
public interface IValueSerializer {	
	 boolean isValid(EObject context, RuleCall ruleCall, Object value);	
	 String serializeAssignedValue(EObject context, RuleCall ruleCall, Object value, INode node);	
}
public interface IEnumLiteralSerializer {	
	 boolean isValid(EObject context, RuleCall ruleCall, Object value);	
	 String serializeAssignedEnumLiteral(EObject context, RuleCall ruleCall, Object value, INode node);	
}
Hooks #3: SemanticSequencer
class SerSipSemanticSequencer extends AbstractSerSipSemanticSequencer {	
!
	 @Inject SerSipGrammarAccess grammarAccess;	
!
	 /**	
	 * Constraint:	
	 * (children+=Path | children+=List)	
	 */	
	 override protected sequence_Root(EObject context, Root semanticObject) {	
	 	 val feeder = createSequencerFeeder(semanticObject)	
	 	 for (child : semanticObject.children) {	
	 	 	 if (child.seg != null) {	
	 	 	 	 feeder.accept(grammarAccess.rootAccess.childrenPathParserRuleCall_2_0_0, child)	
	 	 	 } else {	
	 	 	 	 feeder.accept(grammarAccess.rootAccess.childrenListParserRuleCall_2_0_1, child)	
	 	 	 }	
	 	 }	
	 	 feeder.finish()	
	 }	
}	
Root:	
	 "optional"? ID children+=(Path | List);	
!
List returns Child:	
	 "list" item+=ID ("," item+=ID)*;	
!
Path returns Child:	
	 "path" seg=ID ({Segment.parent=current} seg=ID)+;
Hooks #4: SyntacticSequencer
class SerSipSyntacticSequencer extends AbstractSerSipSyntacticSequencer {	
!
	 /**	
	 * terminal ID 		 : '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;	
	 */	
	 override getIDToken(EObject semanticObject, RuleCall ruleCall, INode node) {	
	 	 if (node != null)	
	 	 	 return getTokenText(node);	
	 	 return "";	
	 }	
!
	 /**	
	 * Syntax:	
	 * 'optional'?	
	 */	
	 override emit_Root_OptionalKeyword_0_q(EObject semanticObject, ISynNavigable transition, List<INode> nodes) {	
	 	 acceptNodes(transition, nodes);	
	 }	
}
Root:	
	 "optional"? ID children+=(Path | List);
Understanding Ambiguities
•A grammar is ambiguous for the serializer if there can be more
than one textual syntax for a given model.!
!
•A grammar is ambiguous for the parser if there can be more than
one model for a given textual syntax1.
R1:	
foo=ID | bar=ID;
R2:	
(“foo” val=ID) | 	
(“bar” val=ID);
Does “a” parse
to or ?foo=a bar=b
Does serialize to
“foo b” or “bar b”?
val=b
parseserialize
1. that is not completely correct. A grammar is ambiguous when there is more than one path to
consume a given input. That, however, usually leads to different models.
Understanding Ambiguities
Root: 	
(foos=Foo | bars=Bar)*;	
!
Foo: 	
“foo” foo=ID;	
!
Bar: 	
“bar” bar=ID;
(1) . 	
(2) bar b foo a	
(3) foo a bar b
Root
foo=a bar=b
Does (1) serialize
to (2) or (3)?
Avoiding Ambiguities
Analysis: Grammars can be ambiguous for the serializer.!
!
Example:
Root: 	
(foos=Foo | bars=Bar)*;	
!
Foo: 	
“foo” foo=ID;	
!
Bar: 	
“bar” bar=ID;
Root: 	
members+=Member;	
!
Member:	
Foo | Bar;	
!
Foo: 	
“foo” foo=ID;	
!
Bar: 	
“bar” bar=ID;
•Single “member” list to maintain order!
•“Foo” and “Bar” extend “Member”!
•implement getFoo() and getBar() as filters on getMembers()
Avoid Unserializeable Models
The Xtext grammar can imply constraints on your model. !
!
Serialization is only possible if the model complies with theses
constraints because the grammar does not define syntax for models that
don’t comply. Examples:
GRAMMAR CONSTRAINT
R1: name=ID; name != null
R2: (name=ID | title=STING); (name != null) ^ (title != null)
R3: items+=ID+; items.size() >= 1
R4: (a+=ID b+=ID)*; a.size() == b.size()
Avoid Unserializeable Models
Solution:!
!
a) Ensure your grammar does not imply constraints. !
b) Ensure your TransientValueService prevents constraint violations.!
Implement Xtext Validation to enforce constraints.!
!
This will also improve your error messages but may make content assist
too chatty.
RESTRICTIVE SAFE
R1: name=ID;
R1: name=ID?;	
R1: (name=ID | “?”);
R2: (name=ID | title=STING); R2: name=ID? title=STING?;
R3: items+=ID+; R3: items+=ID*;
R4: (a+=ID b+=ID)*;
R4: (a+=ID b+=ID)*	
(“spareA={“ a+=ID+ “}”)?	
(“spareB={“ b+=ID+ “}”)?;
Configure Your Scope Right
You can (and need) to configure global scoping for a ResourceSet!
XtextLiveScopeResourceSetProviderXtextResourceSetProvider
ResourceSet
Dirty Editor State
Index
(default) (what you need)
<shadows>
<shadows>
Dirty Editor State
Index
<shadows>
Happy Serializing!

Serializing EMF models with Xtext

  • 1.
    Moritz Eysholdt, itemisAG Serializing EMF models with Xtext Xtextcon 2014! Kiel, Germany
  • 2.
    Agenda ❖ Parsing vs.Serializing! ❖ Use Cases! ❖ Generating vs. Serializing! ❖ Challenges! ❖ The Contract! ❖ Architecture! ❖ Hooks! ❖ Advice
  • 3.
    The New andthe Old Serializer org.eclipse.xtext.serializer •The New Serializer. This presentation is about it. org.eclipse.xtext.parsetree.reconstr •The Old Serializer. Don’t used it.! •Cryptic Error Messages! •Bad Performance for Large Models! •marked @Deprecated
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Why Serialize? ❖ objectmodel usually easier to modify than its textual representation! ❖ guaranteed syntactical correctness! ❖ automatic handling of comments! ❖ automatic handling of whitespace …when I have Xtend, THE language for generators
  • 12.
    When not toSerialize ❖ Avoid serializing models that are broken due to parse errors.! ❖ Template languages are simpler when writing out large text chunks that never change.
  • 13.
    The hard Contract Parsinga serialized Model must result ! in a model equal to the original. Model original = createModel() String serialized = serialize(original) Model parsed = parse(serialized) assertEqual(original, parsed) …enables textual models as persistence format
  • 14.
    The soft Contract Serializinga model that has been modified after parsing! should only change the smallest number of characters necessary. …keep diffs small String originalDoc = loadDocument() Model parsed = parse(originalDoc) Model modified = applyModifications(parsed) String newDoc = serialize(modified) int numberOfChars = diff(originalDoc, newDoc).size() 1. Keep textual diffs small.! 2. Strictly comply with the semantic model! 3. Loosely comply with the node model
  • 15.
    Challenges ❖ unassigned elements! ❖ambiguous mapping from EStructuralFeatures to Assignments! ❖ ambiguous mapping from EClasses to ParserRules and Actions …because the AST is actually abstract, as in “lack of information”
  • 16.
    API Frontends #1 packageorg.eclipse.xtext.resource; ! public class XtextResource extends ResourceImpl { ! (…) ! public void save(Map<?, ?> options) throws IOException {} ! public final void save(OutputStream outputStream, Map<?, ?> options) throws IOException {} }
  • 17.
    API Frontends #2 packageorg.eclipse.xtext.serializer; ! @ImplementedBy(Serializer.class) public interface ISerializer { public String serialize(EObject obj); public String serialize(EObject obj, SaveOptions options); public void serialize(EObject obj, Writer writer, SaveOptions options) throws IOException; public ReplaceRegion serializeReplacement(EObject obj, SaveOptions options); } Options:! - format: ! - true: format all! - false: format regions without whitespace information! - validate: don’t use, it’s an old algorithm.
  • 18.
    Before Serialization Create oneState Machine per Context per EClass Root: "optional"? ID children+=(List | Path); ! List returns Child: "list" item+=ID ("," item+=ID)*; ! Path returns Child: "path" seg=ID ({Segment.parent=current} seg=ID)+; Context EClass Root Root List Child Path Segment Path_Segment_2_0 Child Path_Segment_2_0 Segment
  • 19.
    During Serialization #1 Findright state machines and ! find right path through them Root: "optional"? ID children+=(List | Path); ! List returns Child: "list" item+=ID ("," item+=ID)*; Context EClass Root Root Context EClass List Child optional Foo list a, b, cOutput:
  • 20.
    Architecture #3 Context EClass RootRoot Context EClass List Child Context EClass Root Root Context EClass List Child Semantic! assigned grammar elements Syntactic! unassigned grammar elements optional Foo list a, b, cOutput:
  • 21.
    Architecture #4: Observer Semantic! Sequencer Syntactic! Sequencer HiddenToken! Sequencer FormatterWriter Events Listens To assigend! * RuleCalls! * Terminals! * DataTypes! * Keywords! * CrossRefs unassigned! * RuleCalls! * Terminals! * DataTypes! * Keywords * whitespace! * comments modifies! whitespace writes to ! stream
  • 22.
    Output: /*X*/ Foolist a Serializer HiddenTokenSequencerSyntacticSequencerSemanticSequencer createSequence (Root, Root) enter AssignedParserRuleCall (children=List) enter AssignedParserRuleCall (children=List) createSequence (List, Children) accept UnassignedRuleCall (ID) accept AssignedTerminalRuleCall (itemi=ID) accept Keyword ("list") accept AssignedTerminalRuleCall (itemi=ID) leave AssignedParserRuleCall (children=List) leave AssignedParserRuleCall (children=List) Formatter Writer enterAssignedParserRuleCall() acceptWhitespace() acceptWhitespace() acceptComment(/*X*/) acceptUnassignedRuleCall() acceptKeyword() acceptWhitespace() acceptAssignedTerminalRuleCall() acceptWhitespace() enterAssignedParserRuleCall() Root: "optional"? ID children+=(List | Path); List returns Child: "list" item+=ID ("," item+=ID)*;
  • 23.
    SerializerFragment fragment = serializer.SerializerFragmentauto-inject { generateStub = true // generateDebugData = true } •SerializerFragment not required to use serializer!! •Only purpose of SerializerFragment is to generate convenience API! •generateDebugData to generate pretty state machine diagrams (graphviz dot)
  • 24.
    Hooks #1: ITransientValueService •selectivelyexclude model-objects and -values from serialization! •transient == NOT serialized! •Default setting from Ecore model: EStructuralFeature.isTransient() public interface ITransientValueService { ! enum ListTransient { NO, SOME, YES } ! enum ValueTransient { NO, PREFERABLY, YES } ! public ListTransient isListTransient(EObject semanticObject, EStructuralFeature feature); ! public boolean isValueInListTransient(EObject semanticObject, int index, EStructuralFeature feature); ! public ValueTransient isValueTransient(EObject semanticObject, EStructuralFeature feature); }
  • 25.
    Hooks #2: TokenSerialization public interface ICrossReferenceSerializer { boolean isValid(EObject context, CrossReference crossref, EObject target, INode node); String serializeCrossRef(EObject context, CrossReference crossref, EObject target, INode node); } public interface IKeywordSerializer { boolean isValid(EObject context, Keyword keyword, Object value); String serializeAssignedKeyword(EObject context, Keyword keyword, Object value, INode node); } public interface IValueSerializer { boolean isValid(EObject context, RuleCall ruleCall, Object value); String serializeAssignedValue(EObject context, RuleCall ruleCall, Object value, INode node); } public interface IEnumLiteralSerializer { boolean isValid(EObject context, RuleCall ruleCall, Object value); String serializeAssignedEnumLiteral(EObject context, RuleCall ruleCall, Object value, INode node); }
  • 26.
    Hooks #3: SemanticSequencer classSerSipSemanticSequencer extends AbstractSerSipSemanticSequencer { ! @Inject SerSipGrammarAccess grammarAccess; ! /** * Constraint: * (children+=Path | children+=List) */ override protected sequence_Root(EObject context, Root semanticObject) { val feeder = createSequencerFeeder(semanticObject) for (child : semanticObject.children) { if (child.seg != null) { feeder.accept(grammarAccess.rootAccess.childrenPathParserRuleCall_2_0_0, child) } else { feeder.accept(grammarAccess.rootAccess.childrenListParserRuleCall_2_0_1, child) } } feeder.finish() } } Root: "optional"? ID children+=(Path | List); ! List returns Child: "list" item+=ID ("," item+=ID)*; ! Path returns Child: "path" seg=ID ({Segment.parent=current} seg=ID)+;
  • 27.
    Hooks #4: SyntacticSequencer classSerSipSyntacticSequencer extends AbstractSerSipSyntacticSequencer { ! /** * terminal ID : '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; */ override getIDToken(EObject semanticObject, RuleCall ruleCall, INode node) { if (node != null) return getTokenText(node); return ""; } ! /** * Syntax: * 'optional'? */ override emit_Root_OptionalKeyword_0_q(EObject semanticObject, ISynNavigable transition, List<INode> nodes) { acceptNodes(transition, nodes); } } Root: "optional"? ID children+=(Path | List);
  • 28.
    Understanding Ambiguities •A grammaris ambiguous for the serializer if there can be more than one textual syntax for a given model.! ! •A grammar is ambiguous for the parser if there can be more than one model for a given textual syntax1. R1: foo=ID | bar=ID; R2: (“foo” val=ID) | (“bar” val=ID); Does “a” parse to or ?foo=a bar=b Does serialize to “foo b” or “bar b”? val=b parseserialize 1. that is not completely correct. A grammar is ambiguous when there is more than one path to consume a given input. That, however, usually leads to different models.
  • 29.
    Understanding Ambiguities Root: (foos=Foo| bars=Bar)*; ! Foo: “foo” foo=ID; ! Bar: “bar” bar=ID; (1) . (2) bar b foo a (3) foo a bar b Root foo=a bar=b Does (1) serialize to (2) or (3)?
  • 30.
    Avoiding Ambiguities Analysis: Grammarscan be ambiguous for the serializer.! ! Example: Root: (foos=Foo | bars=Bar)*; ! Foo: “foo” foo=ID; ! Bar: “bar” bar=ID; Root: members+=Member; ! Member: Foo | Bar; ! Foo: “foo” foo=ID; ! Bar: “bar” bar=ID; •Single “member” list to maintain order! •“Foo” and “Bar” extend “Member”! •implement getFoo() and getBar() as filters on getMembers()
  • 31.
    Avoid Unserializeable Models TheXtext grammar can imply constraints on your model. ! ! Serialization is only possible if the model complies with theses constraints because the grammar does not define syntax for models that don’t comply. Examples: GRAMMAR CONSTRAINT R1: name=ID; name != null R2: (name=ID | title=STING); (name != null) ^ (title != null) R3: items+=ID+; items.size() >= 1 R4: (a+=ID b+=ID)*; a.size() == b.size()
  • 32.
    Avoid Unserializeable Models Solution:! ! a)Ensure your grammar does not imply constraints. ! b) Ensure your TransientValueService prevents constraint violations.! Implement Xtext Validation to enforce constraints.! ! This will also improve your error messages but may make content assist too chatty. RESTRICTIVE SAFE R1: name=ID; R1: name=ID?; R1: (name=ID | “?”); R2: (name=ID | title=STING); R2: name=ID? title=STING?; R3: items+=ID+; R3: items+=ID*; R4: (a+=ID b+=ID)*; R4: (a+=ID b+=ID)* (“spareA={“ a+=ID+ “}”)? (“spareB={“ b+=ID+ “}”)?;
  • 33.
    Configure Your ScopeRight You can (and need) to configure global scoping for a ResourceSet! XtextLiveScopeResourceSetProviderXtextResourceSetProvider ResourceSet Dirty Editor State Index (default) (what you need) <shadows> <shadows> Dirty Editor State Index <shadows>
  • 34.