One of the hidden gems of the XML technology milieu is XQuery ... even if XML is not your thing (for example you work only with JSON) I will show you how XQuery is especially adept and super fast in terms of pure web development. And if you do happen to have a lot of XML I will also demonstrate how this concise, small data orientated language can result in serious productivity gains over other programming languages. XQuery is available today on the server, in the browser (mobile devices) and large deployments in both commercial and open source environments.
This talk will present both anecdotal and evidence based analysis showing how using XQuery can result in quicker development times over other programming languages when applied to the right scenarios and will try to help give a starting point to those wishing to investigate this extremely powerful little language.
7. The Mythical Man Month- Fred Brooks 'All programmers are optimists.’ 'Adding manpower to a late software project makes it later.’ 'The fundamental problem with program maintenance is that fixing a defect has a substantial (20-50 percent) chance of introducing another.’
8. 'building software will always be hard. There is no silver bullet.' Problem #1 - Programming is hard
12. managing data variability, volume & velocity is hard Problem #2 - Your boss knows about the Big Data opportunity
13. Why again .. Solution #1: Choosing the right language can result in 4-5 times greater productivity Solution #2 – Choose a language that mitigates risks associated with BigData
18. SQL like –Inner Join SELECT * FROM employee, department WHERE employee.DepartmentID = department.DepartmentID; for $emp in //employee return $emp[@deptideq //dept/@id]
19. SQL like – left Outer Join for $u in fn:collection(‘customers’) return <customer id={$u/custno} name=“{$u/name}”> { for $p in fn:collection(”purchaseorders”)//po where $u/custno = $p//custno return <po>{$p/@id}</po> } </customer>
21. Code in the language of the domain declare function local:test( $a,$b,$c,$d,$e,$f,$g,$h,$i,$k){ …..}; declare function local:test( $a as element(record) ){ ……};
22. Literals and Constructors xquery version "1.0"; let $names := ("Jim","Gabi","Vojtech","Norm","Nuno","Eric") return <names>{ for $name in $names return <name>{$name}</name> }</names> (: element {$computed-element-name}{ attribute {$computed-attr-name}{"some attr value"} } :)
23. Inline Caching xquery version "1.0-ml"; (: xquerymemoization example for use with MarkLogic :) declare variable $cache := map:map(); declare function local:factorial($n as xs:integer) as xs:integer { if ($n < 0) then (0) else if ($n = 0) then (1) else $n * local:factorial($n - 1) }; declare function local:memoize($func,$var){ let $key := xdmp:md5(concat($func,string($var))) return if(map:get($cache,$key)) then (map:get($cache,$key), xdmp:log('cache hit')) else let $result := xdmp:apply($func,$var) return (map:put($cache,$key,$result), $result) }; let $memoize := xdmp:function(xs:QName("local:memoize")) let $factorial := xdmp:function(xs:QName("local:factorial")) let $a:= xdmp:apply($memoize, $factorial, 20) let $b:= xdmp:apply($memoize, $factorial, 20) let $c:= xdmp:apply($memoize, $factorial, 20) let $d:= xdmp:apply($memoize, $factorial, 20) let $e:= xdmp:apply($memoize, $factorial, 20) return $a
34. An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tclfor a search/string-processing programLutz Prechelt (prechelt@ira.uka.de) Fakulta ̈t fu ̈r InformatikUniversita ̈t Karlsruhe * Designing and writing programs using dynamic languages tended to take half as long as well as resulting in half the code.
40. * Cloudera– 2011 http://www.cloudera.com/videos/introduction-to-apache-pig PigLatin is a DSL for data for apache hadoop
41. Do Software Language Engineers Evaluate their Languages? 2011 - CITI, Departamento de Informa ́tica, Faculdade de Ciˆencias e Tecnologia, FCT, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal pedrohgabriel@gmail.com,{miguel.goulao,vasco.amaral}@di.fct.unl.pt http://citi.di.fct.unl.pt/ SLE consistently relax language evaluation no validation from users recommend systematic approach to DSL evaluation
45. Written answers What was different about learning or using XQuery versus other programming languages ? Do you have any opinions if XQuery is more or less productive language versus other programming languages ?
46. Choose an option that you most strongly believe in … 66% 14% 10%
47. Do you think XQuery makes you a more productive programmer ? 67% 14% 10% 8%
48. Is XQuery more productive then Java in developing web based data applications ? 58% 22% 12% 8%
t it would be useful to frame this talk in terms of what we are trying to solve.We’re going to employ are highly scientific analysis of the problemOnly have time for 2
The Mythical Man Month was an analysis, done way back in 1967, by Fred Brooks of the problemsinvolved in large software projects; using the example of theeffort that IBM started to buildIBM System/360 Operating System,which was a batch processingoperating system developed by IBM for their then-new System/360mainframe computer, announced in 1964;I have a soft spot for the book as I reread it every decade or so and continue to be amazed at the longevity of pearls of wisdom it contains.Whilst not all of its findings have survived the test of time, many have and been confirmed by study after study.
Specifically I pick out one statement that Software development is intrinsically difficult.This is down to the fact that the act of programming is working with an infinitely tractable medium e.g. we build stuff with pure thought stuff.
Impedence mismatchWe know about the 2 or more decades of difficulties coding interfaces between relational databases and objectsORM can be dangerousMultifarious data modelsData models in the data baseData models in our applicationData models over the wireORM complicates things hereDifferent data formats e.g. json, xml, rdbms, text, binariesChange management in data story unclearRelaxing constraintsNOSQL is challenging the status quo by asking us to review the merits of ancient traditions like ACID and CAPCurt Monash definition of NoSQL - http://www.dbms2.com/2011/10/02/defining-nosql/NoSQL is most easily defined by what it excludes: SQL, joins, strong analytic alternatives to those, and some forms of database integrity. If you leave all four out, and you have a strong scale-out story, you’re in the NoSQL mainstream.Open sourceCheapMaybe acid offensive
Just 10% increase in using a companies existing data can result in giant gains.
Show how this relates to specific industry sectors …
The three v’s of data is hard to manage.When I first saw this graphic I thought it was a pair of programmers (mostly because the guys look kind of like Larry Wall), but I think these guys are business guys and it occurred to me that we are in a strange place now where business folk are making commercial decisions based on algorithms … algorithms are absolutely crucial to our craft but it trivilizes the solution … like saying we will use hammers to build a house; of course we will use hammers.Developers need to balance off their desire to learn algorithms with the reality of getting stuff done
#1Fred Brooks discovered that if you choose the right programming language in the right situation that you can get 4-5 times greater productivity … I am going to try and show you how to get this kind of jump in productivity using XQueryThis is not insignificant … this means in a months time I can get 4 months of work done … I have had those months where I am just cruising, and then I have those months where I feel like I am 4-5 times slower (but I estimated based on the good times) perhaps its was down to using the right tools.#2Because we will increasingly be asked to work with all kinds of data, are the architectures/software/infrastructure you using today up to the challenge ?Choosing a language like Xquery, specifically created to work with data, can help you work with BigDATA … its important to note that a sufficiently abstracted language will protect you from the storm of changing algorithms.
I have a hard job to show you how Xquery can be productive, and ultimately try to convince you to try out Xqueryyou already know a few programming languagespeople make up their own mindsbecause you want to see running code (I promise soon!)I will use a mixture of evidence based analysis and show you the boundaries of the language
XQuery is to XML likeJavaScript is to JSONJust because Xquery has an X in it doesn’t mean its specifically for XMLJust because Xquery has a QUERY in it doesn’t mean its constrained to querying data
Superset of XPATHXQuery is a declarative programming language which can be used to express queries and transformations of XML data. It was designed by the W3C according to a series of use cases in the area of Web data management.XQuery's most suitable purpose is in making polystructured data (i.e. XML) contained in data repositories accessible, tractable and scrutableDSLDomain-driven development is an approach to developing software which relies on Domain Specific Languages (DSLs) and Models (DSMs) to raise the level of abstraction, while at the same time narrowing down the design space (check out Martin Fowlers book on DSL) Strongly TypedActually xquery is one of those languages, like dart, which blurs the line between weakly or strongly typed … though not in the way that the google folk have chosen. It’s a strongly typed language but you can completely opt out typing and the language will fall back onto a clear and consistent set of implicit typing (which I will show you).Lots of SpecsThere are a lot of interrelated specs with Xquery … the bad news is if you are an implementator you have to read all those specs, the good news is most people are not implementators … also whats good news is this approach to spec generation whilst agonizing can usually result in more modular specs.
Inner joins, left outer joins, etc
Xquery 3.0 will treat functions as first class … though most xquery processors currently implement
ACID compliant fully transactional database for polystructured dataUniversal IndexFast queriesFlexible – load as-is, index everythingComposable queries – also very fastShared-nothing ClusterScalable to PBs on commodity hardwareSimple scaling – add a new D-host or E-host in minutesMVCCNew information is searchable immediatelyFast loads – no update in-placeFast queries – no locking
Xquery is widely supported, mature and deployable in most placesEnvironmentServerBrowserDevices e.g. mobile foneBindings in many languagesPerlRubyPythonC aspJavaOSJava.NETUNIX/LINUXCommercialFLOSSXML Databases
I have showed you a bit of Xquery and its boundaries, will now present a bit of analysis I did to try and identify if Xquery is productive and more specifically when its productive
Searched around in the literature of how to measure a programming language’s productivity
Settled on #loc per function pointLOCLine of code (LOC)Function pointsA method of decomposing a projects requirements in hope of being able to estimate effort to do the project
Before you start throwing stuff at me for mentioning LOC and FPI do not subscribe to using LOCC and FP for project estimation … though clearly there is a lot of historical analysis which I will leverage
The following study (related to previous study) discovered the avg number of lines of code to implement a single function pointPlease take all numbers in a relative sense versus absoluteDesigning and writing programs using dynamic languages tend to take half as long, resulting in half the code
Using historical #loc/fp I want to plot Xquery place in the world
Useful study on implementing an entire enterprise web application using xquery versus java
Amazon client libraries written in XQuery have 80% less code than their equivalent written in Java.
Projects have been anonymised to protect the innocent (my colleagues, clients, etc) … disclaimer I did 4 of these Xquery projectsTried to reduce mixed language affect … e.g. but because of Xquery ‘dsl’ness for things like data apps no problemsFP range between ~250-1200Toke me 4 daysVAF in actuality remained close to 1.0Methodologyanalyzed 11 reasonably sized projects (4 were done my me)cloc defined lines of code based on user point of view I defined FP and summed themdefined VAF for each projectVAF = (TDI*0.01) + 0.65AVP = VAF * sum of FP
Close to SQLSeems to matchup that its twice as productive as Java on paperA bit too neat
All seemed very neat, so started trying to punch holes, checked my analysis, etc.Meta analysis and review on articles published in top ranked venues, from 2001 to 2008, which report DSLs’ construction and best practice about DSL’sDSLs can contribute to increment productivity, while reducing the required maintenance and programming expertise.
Ran from Sept 20 – Oct 1st102 people responded15,000,000 programmers worldwide (wikipedia)~100 people95% certainconfidence interval: +- 9.8% errorUnited States 43% United Kingdom 15% Germany 10% France 8% Czech Republic 3% Netherlands 2% Switzerland 2%50% people put their name to the poll
Strong correlation between XML and usage of text, rdbms and jsonMultiple choiceXML 95 36%Text 40 15%RDBMS 39 15%JSON 32 12%Binaries (images, video, etc) 27 10%Office documents 18 7%Semantic web stuff (RDF, owl, etc..) 15 6%
What are your impressions of XQuery as a general programming language ?Most people agree its either not or just barely a GPLbetter then xsltXslt programmers still like XSLTXrx or xml database – very powerful and productiveneeds extensions, tooling, better docsSignificant said its powerful and they love itFunctional programming goodstellar at xml processingstruck by the number of people who only programmed ever in xquery (novice programmers)Do you have any opinions if XQuery is more or less productive language versus other programming languages ?When used with xrx or xml databasecan be more productive when used in combination with other languages, each playing to each others strengths and weaknessesauthoring more expressive/efficient then sqlNot a GPLWhat was different about learning or using XQuery versus other programming languages ?function paradigmsequenceslack of docs,tools, debugger, idefp more focused on data structures and algorithms; Java C, C++, .Net: more focused on the state of the world and the way it changes.What is XQuery good or bad for ? Please expand as much as possible. great for filtering and transforming xmlxslt better at the view (render to xhtml)xquery and javascript can be problematic (dates)bad with binaries of courseits a DSLPresently, XQuery's most suitable purpose is in making semi-structured (i.e. XML) information repositories accessible, scrutable, and tractable.Set-oriented programming. Semi-structured data. Full text search together with structural search. Interaction with the Web -- via REST
Clearly we are talking to xquery developers who preach the XML wordSingle choiceI love XML 66 I am ok with XML but hatenamespaces 14 I love HTML5 and am ok with XML 10 I am ok with JSON but hatejavascript 4 I hate JSON 3 I love JSON 3 I hate XML 0 I love HTML5 and hate XML 0
Single OptionYes 67 67%Maybe 14 14%No 10 10%Don’tKnow 8 8%
Yes 58 58%Don’tknow 22 22%Maybe 12 12%No 8 8%What are your impressions of XQuery as a general programming language ?Most people agree its either not or just barely a GPLbetter then xsltXslt programmers still like XSLTXrx or xml database – very powerful and productiveneeds extensions, tooling, better docsSignificant said its powerful and they love itFunctional programming goodstellar at xml processingstruck by the number of people who only programmed ever in xquery (novice programmers)Do you have any opinions if XQuery is more or less productive language versus other programming languages ?When used with xrx or xml databasecan be more productive when used in combination with other languages, each playing to each others strengths and weaknessesauthoring more expressive/efficient then sqlNot a GPLWhat was different about learning or using XQuery versus other programming languages ?function paradigmsequenceslack of docs,tools, debugger, idefp more focused on data structures and algorithms; Java C, C++, .Net: more focused on the state of the world and the way it changes.What is XQuery good or bad for ? Please expand as much as possible. great for filtering and transforming xmlxslt better at the view (render to xhtml)xquery and javascript can be problematic (dates)bad with binaries of courseits a DSLPresently, XQuery's most suitable purpose is in making semi-structured (i.e. XML) information repositories accessible, scrutable, and tractable.Set-oriented programming. Semi-structured data. Full text search together with structural search. Interaction with the Web -- via REST----- Meeting Notes (05/10/2011 12:22) -----
Corona makes it very easy to work across all data and can be used wherever u are using NoSQL solution without having to know any XML techNo compromise with ML e.g. ACID compliant, single node (modest spec) I get 50 docs per sec ingest of json, 80 docs per sec ingest of xml … delta due to the fact that we have to parse json back and forth but I suspect at some point we will get internal impl of this kind of thing.In a related development, the 28msec guys have done the reverse by placing Xquery on top of mongodb
I gave a brief overview of XqueryPresented a range of analysis that indicates XQuery is a GSD languagePresented some cool stuff with Hadoop and Corona showing how Xquery with XML databases can help with Big Data problem
Are the problems for real? Programming seems to continue to be hardThe biggest SQL company around Oracle just announced its NoSQL solution … hows that for irony ?Is Xquery productivity for real ?Loc per fp seems to suggest that people are feeling somethingIndirect analysis like survey seems to reinforce
Average salary = $99kThere is a strong trend of java related jobs with XML – which will ring true when we get into analysisGet paid 20% more then php developersWe were one of the rare companies during the downturn to be super successful and continuing that trendWell Paid: In the US, average yearly salaries for xquery jobs is $99000, 20% higher then average salary for PHPEmerging semantic web technologies good match with Xquery and friendsHigh Demand: MarkLogic hiring like mad, as our clients are seeking partners like madLow Supply: In UK, on average there is one xquery related job posting for each two people that claim to know xquery (Linkedin)
I’ve seen it … MarkLogic has the largest unstructured/polystructured databases (petabytes and beyond) running on the planet doing amazing things with subsecond query response times… I would have to get you all US security clearance to name names but that’s not going to happen, so you will have to trust me. 250+ customers worldwideFor example, an oil and gas company uses MarkLogic to track the location of ships at sea. The MarkLogic Server stores data from GPS, news feeds, weather data, commodity prices, among other things, and surfaces all this data on a map that users can query. For example, a user might ask, “Show me all the ships within this polygon (i.e., geographic area) that are carrying this type of oil and have changed course since leaving the port of origin.” The application then displays the results on the map.We have the largest polystructured data repositories running on the planet in production and Xquery is used as language of choice for querying, analyzing, etc … yes bigger then Oracle, bigger then anyone … I would have to get everyone us security clearance to name names otherwise you will have to trust me
So is Xquery the The Getting Stuff Done Language ?
Eclipse plugin – XQDT - http://www.xqdt.org/html/indexoxygenXMLOnline IDE - eXide - http://demo.exist-db.org/exist/eXide/index.htmlCouchdb ML driver - https://github.com/dscape/nanohttp://exist.sourceforge.nethttp://developer.marklogic.comFunctxhttp://www.xqueryfunctions.comExpath.orgCxan.org
Statistics in a Nutshell – Sarah Boslaugh & Paul Andrew Watters, OReilly 2008Do Software Languages Engineers Evaluate their Languages?Anders H talk GOTO talk on ‘Where Programming Languages are going’Software Assessments, Benchmarks, and Best Practices [Paperback] Addison-Wesley Professional; 1 edition (May 11, 2000)http://www.amazon.com/Assessments-Benchmarks-Addison-Wesley-Information-Technology/dp/0201485427/ref=sr_1_1?ie=UTF8&s=books&qid=1218404297&sr=8-1Prechelt, An emperical comparison... , http://page.mi.fu-berlin.de/~prechelt/Biblio/jccpprtTR.pdfGarret, Lisp as an Alternative to Java , http://www.flownet.com/gat/papers/lisp-java.pdf"Lines of code" is defined as the number of non-blank, non-comment lines.Computed as median number of lines divided by median number of hours, due to lack of raw data for Lisp study. 28msec - http://www.28msec.com/html/home[1] F. B. e. Abreu and W. Melo. Evaluating the impact of object-oriented design on software quality. In METRICS ’96. [2] Bazaar. http://bazaar.launchpad.net/~mysql/mysql- server/. [3] P. Bhattacharya and I. Neamtiu. Higher-level Languages areIndeed Better: A Study on C and C++. Technical report, University of California, Riverside, March 2010. http://www.cs.ucr.edu/~pamelab/tr.pdf.Blender Bug Tracker. https://projects.blender.org/tracker.F. P. Brooks, Jr. The Mythical Man-Month. Addison-Wesley, 1975.Why C is Not My Favorite Language. Technical report,University of California, Berkeley, 2000. [13] J. Fernández-Ramil, D. Izquierdo-Cortazar, and T. Mens.What does it take to develop a million lines of open sourcecode? In OSS, pages 170–184, 2009. [14] Firefox Statistics. http://www.computerworld.com/s/article/9140819/1_in_4_now_use_Z. Hongyu, Z. Xiuzhen, and G. Ming. Predicting defective software components from code complexity measures. In ISPRDC ’07.C. Jones. Backfiring: Converting lines-of-code to function points. Computer, pages 87–88, 1995.S. Kim, T. Zimmermann, K. Pan, and E. J. J. Whitehead. Automatic identification of bug-introducing changes. In ASE, pages 81–90, 2006.S. Koch. Exploring the effects of coordination and communication tools on the efficiency of open source projects using data envelopment analysis. IFIP, 2007.S. Koch. Exploring the effects of sourceforge.net coordination and communication tools on the efficiency of open source projects using data envelopment analysis. Empirical Softw. Eng., 14(4):397–417, 2009.J. Koskinen. Software maintenance costs, Sept 2003. http: //users.jyu.fi/~koskinen/smcosts.htm.M. Lipow. Number of faults per line of code. TSE’82. [26] T. J. McCabe. A complexity measure. In ICSE’76. [27] T. Mens, M. Wermelinger, S. Ducasse, S. Demeyer,R. Hirschfeld, and M. Jazayeri. Challenges in softwareevolution. In IWPSE ’05, pages 13–22, 2005. [28] A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two casestudies of open source software development: Apache andmozilla. ACM Trans. Softw. Eng. Methodol., 11(3), 2002. [29] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. ICSM’00, pages 120–130.Myrtveit and E. Stensrud. An empirical study of software development productivity in C and C++. In NIK’08.N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In ICSE, 2005.NIST. The economic impacts of inadequate infrastructure for software testing. Planning Report, May 2002.M. C. Paulk. An Empirical Study of Process Discipline and Software Quality. PhD thesis, Univ. of Pittsburgh, 2005.G. Phipps. Comparing observed bug and productivity rates for java and c++. Software Practice and Experience, 29, April 1999.M Squared Technologies - Resource Standard Metrics. http://msquaredtechnologies.com/.RSM Metrics. http://msquaredtechnologies. com/m2rsm/docs/index.htm.TIOBE Software BV. TIOBE Programming Community Index. http://www.tiobe.com/index.php/ content/paperinfo/tpci/index.html.Statistics in a Nutshell – Sarah Boslaugh & Paul Andrew Watters, OReilly 2008http://gigaom.com/cloud/big-data-equals-big-opportunities-for-businesses-infographic/Software Assessments, Benchmarks, and Best Practices [Paperback] Addison-Wesley Professional; 1 edition (May 11, 2000)http://www.amazon.com/Assessments-Benchmarks-Addison-Wesley-Information-Technology/dp/0201485427/ref=sr_1_1?ie=UTF8&s=books&qid=1218404297&sr=8-1Software Estimation: Demystifying the Black Art (Best Practices (Microsoft)) [Paperback] Microsoft Press; 1 edition (March 1, 2006) http://www.amazon.com/Software-Estimation-Demystifying-Practices-Microsoft/dp/0735605351/ref=pd_sim_b6Workday Mysql as key-value store http://www.dbms2.com/2010/08/22/workday-technology-stack/http://www.qsm.com/resources/function-point-languages-table #locc/fp Version 4.0 November 2009An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program Lutz Prechelt (prechelt@ira.uka.de) Fakulta ̈t fu ̈r InformatikUniversita ̈t Karlsruhe Technical Report 2000-5 March 10, 2000Khaled El Emam, SaidaBenlarbi, NishithGoel, and Shesh N. Rai: “The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics“. IEEE Transasctions on Software Engineering, 27(7), July 2001.http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.htmlUsability of XML Query Languages. JorisGraaumans. SIKS Dissertation Series No 2005-16, ISBN 90-393-4065-Xhttp://commons.wikimedia.org/wiki/File:Milodon_cave.JPGhttp://commons.wikimedia.org/wiki/File:ON_WITH_THE_JOB._GET_IT_DONE_THE_SAFE_AND_SURE_WAY_-_NARA_-_515117.tifhttp://upload.wikimedia.org/wikipedia/commons/3/36/Cyril_Ponnamperuma_analyzing_a_moon_sample.jpgCloud image http://www.nehmzow.de/reportagefeatures/