Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

XML Processing in Scala (XML London 2014)

274 views

Published on

Scala is an established static- and strongly-typed functional and object-oriented scalable programming language for the JVM with seamless Java interoperation.
Scala and its ecosystem are used at LinkedIn, Twitter, Morgan Stanley among many companies demanding remarkable time to market, robustness, high performance and scalability.
This paper shows you Scala's strong native XML support, powerful XQuery-like constructs, hybrid processing via XQuery for Scala, and increased XML processing performance. You will learn how you can benefit from Scala's practicality in a commercial setting, ultimately increasing your productivity.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

XML Processing in Scala (XML London 2014)

  1. 1. XML Processing in William Narmontas Dino Fancellu www.scala.contractors XML LONDON 2014
  2. 2. Dino Fancellu 35 years IT Scala • Java • XML William Narmontas 10 years IT Scala • XML • Web
  3. 3. What is Scala?
  4. 4. Scala processes XML fast
  5. 5. It is powerful
  6. 6. Concise Functional Object-oriented Modular Statically-typedStrongly-typed Type-safe Performant Java-interoperable Composable Unopinionated First-class XML
  7. 7. Who uses Scala? Apple Bank of America Barclays BBC BSkyB Cisco Citigroup Credit Suisse LinkedIn Morgan Stanley Netflix Novell Rackspace Sky Sony Springer eBay eHarmony EDF FourSquare Gawker HSBC ITV Klout The Guardian TomTom Trafigura Tumblr Twitter UBS VMware Xerox
  8. 8. Projects in Scala - Less code to write = less to maintain - Communication clearer - Testing easier - Software robust - Time to market: fast - Happier developers
  9. 9. Scala language: Intro
  10. 10. Values val conferenceName = "XML London 2014" let $conferenceName := "XML London 2014" Scala XQuery var conferenceName = "XML London 2014" conferenceName = "XML London 2015" Scala (Mutable)
  11. 11. Strings val language = "Scala" s"XML Processing in $language" | XML Processing in Scala s"""An introduction to: |The "$language" programming language""".stripMargin | An introduction to: | The "Scala" programming language s"$language has ${language.length} chars in its name" | Scala has 5 chars in its name
  12. 12. Functions def fun(x: Int, y: Double) = s"$x: $y" declare function local:fun( $x as xs:integer, $y as xs:double ) as xs:string { concat($x, ": ", $y) }; Scala XQuery
  13. 13. Everything is an expression val trainSpeed = if ( train.speed.mph >= 60 ) "Fast" else "Slow" def divide(numerator: Int, denominator: Int) = try { s"${numerator/denominator}" } catch { case _: java.lang.ArithmeticException => s"Cannot divide $numerator by $denominator" }
  14. 14. Types: Explicit def withTitle(name: String, title: String): String = s"$title. $name" val x: Int = { val y = 1000 100 + y } | x: Int = 1100
  15. 15. Functions: named parameters Further clarity in method calls: def makeLink(url: String, text: String) = s"""<a href="$url">$text</a>""" makeLink(text = "XML London 2014", url = "http://www.xmllondon.com") | <a href="http://www.xmllondon.com">XML London 2014</a>
  16. 16. Functions: default parameters Reduce repetition in method calls: def withTitle(name: String, title: String = "Mr") = s"$title. $name" withTitle("John Smith") | Mr. John Smith withTitle("Mary Smith", "Miss") | Miss. Mary Smith
  17. 17. Functional def incrementedByOne(x: Int) = x + 1 (1 to 5).map(incrementedByOne) | Vector(2, 3, 4, 5, 6)
  18. 18. Lambdas (1 to 5).map(x => x + 1) | Vector(2, 3, 4, 5, 6) (1 to 5).map(_ + 1) | Vector(2, 3, 4, 5, 6)
  19. 19. For comprehensions for { x <- (1 to 5) } yield x + 1 | Vector(2, 3, 4, 5, 6)
  20. 20. Implicit classes: Enrich types implicit class stringWrapper(str: String) { def wrapWithParens = s"($str)" } "Text".wrapWithParens | (Text)
  21. 21. Powerful features for scalability - Case classes - Traits - Partial functions - Pattern matching - Implicits - Flexible Syntax - Generics - User defined operators - Call-by-name - Macros
  22. 22. Scala & XML
  23. 23. Values: Inline XML val url = "http://www.xmllondon.com" val title = "XML London 2014" val xmlTree = <div> <p>Welcome to <a href={url}>{title}</a>!</p> </div> | xmlTree: scala.xml.Elem = | <div> | <p>Welcome to <a href="http://www.xmllondon.com/">XML London 2014</a>!</p> | </div>
  24. 24. XML Lookups val listOfPeople = <people> <person>Fred</person> <person>Ron</person> <person>Nigel</person> </people> listOfPeople "person" | NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>) listOfPeople "_" | NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)
  25. 25. XML Lookups val fact = <fact type="universal"> <variable>A</variable> = <variable>A</variable> </fact> fact "variable" | NodeSeq(<variable>A</variable>, <variable>A</variable>) fact "@type" | : scala.xml.NodeSeq = universal fact @ "type" | : String = universal
  26. 26. XML Loading val pun = """<pun rating="extreme"> | <question>Why do CompSci students need glasses?</question> | <answer>To C#<!-- C# is a Microsoft's programming language -->.</answer> |</pun>""".stripMargin scala.xml.XML.loadString(pun) | <pun rating="extreme"> | <question>Why do CompSci students need glasses?</question> | <answer>To C#.</answer> | </pun>
  27. 27. Collections: expressive val root = <numbers> {for {i <- 1 to 10} yield <number>{i}</number>} </numbers> val numbers = root "number" numbers(0) | <number>1</number> numbers.head | <number>1</number> numbers.last | <number>10</number> numbers take 3 | NodeSeq(<number>1</number>, <number>2</number>, <number>3</number>)
  28. 28. Collections: expressive numbers filter (_.text.toInt > 6) | NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>) numbers(_.text.toInt > 6) | NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>) numbers maxBy (_.text) | <number>9</number> numbers maxBy (_.text.toInt) | <number>10</number> numbers.reverse | NodeSeq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>, <number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>) numbers.groupBy(_.text.toInt % 3) | Map( | 2 -> NodeSeq(<number>2</number>, <number>5</number>, <number>8</number>), | 1 -> NodeSeq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>), | 0 -> NodeSeq(<number>3</number>, <number>6</number>, <number>9</number>))
  29. 29. XML Methods: a rich API % :+ aggregate attributes combinations copyToArray diff dropWhile flatMap foreach head init isInstanceOf lastIndexOfSlice map mkString padTo prefixLength reduceRight runWith segmentLength sortWith strict_== takeRight toBuffer toSeq transpose withFilter zipAll ++ : andThen buildString companion copyToBuffer distinct endsWith flatten genericBuilder headOption inits isTraversableAgain lastIndexWhere max nameToString par product reduceRightOption sameElements seq sorted stringPrefix takeWhile toIndexedSeq toSet union xmlType zipWithIndex ++: apply canEqual compose corresponds doCollectNamespaces exists fold getNamespace indexOf intersect iterator lastOption maxBy namespace partition reduce repr scan size span sum text toIterable toStream unzip xml_!= +: @ applyOrElse child contains count doTransform filter foldLeft groupBy indexOfSlice isAtom label length min nonEmpty patch reduceLeft reverse scanLeft slice splitAt tail theSeq toIterator toString unzip3 xml_== /: asInstanceOf collect containsSlice descendant drop filterNot foldRight grouped indexWhere isDefinedAt last lengthCompare minBy nonEmptyChildren permutations reduceLeftOption reverseIterator scanRight sliding startsWith tails to toList toTraversable updated xml_sameElements /: addString attribute collectFirst copy descendant_or_self dropRight find forall hasDefiniteSize indices isEmpty lastIndexOf lift minimizeEmpty orElse prefix reduceOption reverseMap scope sortBy strict_!= take toArray toMap toVector view zip
  30. 30. For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book let $year := $b/@year where $b/publisher = "Addison-Wesley" and $year > 1991 return <book year="{ $year }"> { $b/title } </book> }</bib> <bib>{ for { b <- xml "book" year = b @ "year" if b "publisher" === "Addison-Wesley" && year > 1991 } yield <book year={ year }> { b "title" } </book> }</bib>
  31. 31. <bib>{ for $b in $xml/book let $year := $b/@year where $b/publisher = "Addison-Wesley" and $year > 1991 return <book year="{ $year }"> { $b/title } </book> }</bib> <bib>{ for { b <- xml "book" year = b @ "year" if b "publisher" === "Addison-Wesley" && year > 1991 } yield <book year={ year }> { b "title" } </book> }</bib> For-comprehensions: similar to XQuery
  32. 32. For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book let $year := $b/@year where $b/publisher = "Addison-Wesley" and $year > 1991 return <book year="{ $year }"> { $b/title } </book> }</bib> <bib>{ for { b <- xml "book" year = b @ "year" if b "publisher" === "Addison-Wesley" && year > 1991 } yield <book year={ year }> { b "title" } </book> }</bib>
  33. 33. For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book let $year := $b/@year where $b/publisher = "Addison-Wesley" and $year > 1991 return <book year="{ $year }"> { $b/title } </book> }</bib> <bib>{ for { b <- xml "book" year = b @ "year" if b "publisher" === "Addison-Wesley" && year > 1991 } yield <book year={ year }> { b "title" } </book> }</bib>
  34. 34. For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book let $year := $b/@year where $b/publisher = "Addison-Wesley" and $year > 1991 return <book year="{ $year }"> { $b/title } </book> }</bib> <bib>{ for { b <- xml "book" year = b @ "year" if b "publisher" === "Addison-Wesley" && year > 1991 } yield <book year={ year }> { b "title" } </book> }</bib>
  35. 35. For-comprehensions: similar to XQuery ... yet is general purposeNice! <bib>{ for $b in $xml/book let $year := $b/@year where $b/publisher = "Addison-Wesley" and $year > 1991 return <book year="{ $year }"> { $b/title } </book> }</bib> <bib>{ for { b <- xml "book" year = b @ "year" if b "publisher" === "Addison-Wesley" && year > 1991 } yield <book year={ year }> { b "title" } </book> }</bib>
  36. 36. Hybrid XML - XQuery for Scala - java.xml.* for free - Look up: XPath - Transform: XSLT - Stream: StAX
  37. 37. XQuery for Scala (XQS) - Wraps XQuery API for Java (javax.xml.xquery) - Scala access to XQuery in: - MarkLogic, BaseX, Saxon, Sedna, eXist, … - Converts DOM to Scala XML & vice versa - http://github.com/fancellu/xqs
  38. 38. XQuery via XQS val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget> </widgets> import com.felstar.xqs.XQS._ val conn = new net.xqj.basex.local.BaseXXQDataSource().getConnection val nodes: NodeSeq = conn("for $w in /widgets/widget order by $w return $w", widgets) | NodeSeq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>, | <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)
  39. 39. XPath import com.felstar.xqs.XQS._ val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget> </widgets> val xpath = XPathFactory.newInstance().newXPath() val nodes = xpath.evaluate("/widgets/widget[not(@id)]", toDom(widgets), XPathConstants.NODESET).asInstanceOf[NodeList] (nodes: NodeSeq) | NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>) Natively in Scala: (widgets "widget")(widget => (widget "@id").isEmpty) | NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)
  40. 40. XSLT val stylesheet = <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="john"> <xsl:copy>Hello, John.</xsl:copy> </xsl:template> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> import com.felstar.xqs.XQS._ val xmlResultResource = new java.io.StringWriter() val xmlTransformer = TransformerFactory.newInstance().newTransformer(stylesheet) xmlTransformer.transform(peopleXml, new StreamResult(xmlResultResource)) xmlResultResource.getBuffer | <?xml version="1.0" encoding="UTF-8"?><people> | <john>Hello, John.</john> | <smith>Smith is here.</smith> | <another>Hello.</another> | </people> val peopleXml = <people> <john>Hello, John.</john> <smith>Smith is here.</smith> <another>Hello.</another> </people>
  41. 41. XML Stream Processing // 4GB file, comes back in a second val src = Source.fromURL("http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-abstract.xml") val er = XMLInputFactory.newInstance().createXMLEventReader(src.reader) implicit class XMLEventIterator(ev:XMLEventReader) extends scala.collection.Iterator[XMLEvent]{ def hasNext = ev.hasNext def next = ev.nextEvent() } er.dropWhile(!_.isStartElement).take(10).zipWithIndex.foreach { case (ev, idx) => println(s"${idx+1}:t$ev") } src.close() | 1: <feed> | 2: | | 3: <doc> | 4: | | 5: <title> | 6: Wikipedia: Anarchism | 7: </title> | 8: | | 9: <url> | 10: http://en.wikipedia.org/wiki/Anarchism
  42. 42. Use Cases - Data extraction - Serving XML via REST - Dynamically generated XSLT - Interfacing with XML databases - Flexibility to choose the best tool for the job
  43. 43. Excellent Ecosystem SBT ScalaTest scala-xml macro-paradise Akka Spray scalaz shapeless JVMscala-maven-plugin Spark Scaladin Specs
  44. 44. Conclusion - Practical - Practical for XML processing
  45. 45. Where do I start? - atomicscala.com - typesafe.com/activator - scala-lang.org - scala-ide.org - IntelliJ
  46. 46. Matt Stephens Charles Foster
  47. 47. Open to consulting www.scala.contractors Follow us on Twitter: @DinoFancellu @ScalaWilliam @MaffStephens

×