Slideshow transcript
Slide 1: + Innovation in the Characterizing the Mashup Ecosystem – Shuli Yu – shuliyu@gmail.com School of Information Systems, Singapore Management University
Slide 2: The Mashup Ecosystem • A “mashup is a web application that combines data from mashup more than one source into a single integrated tool” (Wikipedia, 2008) Developers API integrate + Mashup Individual consumers API Enterprises
Slide 3: +
Slide 4: Research on mashups • Mashups: Unit level characteristics : – Comparing the technologies and architecture and examining how they can be improved (Jackson and Wang, 2007; Liu, Hui, Sun and Liang, 2007) – Classification schemes • Industry verticals (Wikipedia, 2008) • Involvement in the application stack (Hinchcliffe, 2006) • Specific stakeholders – Usage of mashups in particular domains • Cartography (Pietroniro and Ficheter, 2007), libraries in healthcare (Cho, 2007) and digital journals (Kulathuramaiyer, 2007) – Copyright and policy implications of remixing content (O’Brian and Fitzgerald, 2006; Goodman and Moed, 2006)
Slide 5: Research Agenda • Characterize the mashup ecosystem m API m m API API m m API 2-mode network – Describe how the network has evolved • Growth • Network metrics – Determine what makes an API successful
Slide 6: Data source: ProgrammableWeb
Slide 7: Data source: ProgrammableWeb
Slide 8: Data source: ProgrammableWeb
Slide 9: Research Approach • Network structure – Relationship between APIs and mashups m API m m API API m m API • Attributes: Possible success factors – Date created – Category – Rating
Slide 10: Growth Network snapshot s @ 1 month interva Cumulative API and Mashup Growth ls: Sep 2005 to Dec 2 007 3000 2500 Mashups 2000 1500 1000 500 APIs 0 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07
Slide 11: Growth Growth rate of APIs and Mashups (Number of new APIs or Mashups per month) 200 APIs Mashups 150 100 50 0 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07
Slide 12: 2-mode matrix of APIs and mashups APIs m Network snapshot s @ 6 month interva ls: Dec 2005 to Dec 2 007
Slide 13: 2- Visualizing the 2-mode Mashup and API Network: node- Layout by node-repulsion with equal edge length bias Note: Square nodes denote APIs and circle nodes denote mashups
Slide 14: Dec 2005
Slide 15: Dec 2006
Slide 16: Dec 2007
Slide 17: Dec 2007 Selected APIs and their corresponding mashups API Tier 2: API Tier 1: •All popular APIs here - Google Maps •Social/community, Search API Tier 3: • Less popular? • News feeds, online retail, music
Slide 18: Affiliation matrix of APIs 2-Mode Network m API m m API API m m API API Affiliation Network API 2 API 2 API 2 API
Slide 19: Affiliation matrix of APIs APIs APIs Network snapshot s @ 3 month interva ls: Dec 2005 to Dec 2 007
Slide 20: Visualizing the API Affiliation Network: Layout by principal component analysis Note: Size of nodes are proportionate to their degree (number of links to other APIs)
Slide 21: Dec 2005
Slide 22: June 2006
Slide 23: Dec 2006
Slide 24: June 2007
Slide 25: Dec 2007
Slide 26: API Affiliation Network Metrics • Degree – Network connectivity over time • Small Worlds – Clustering coefficient – Path length • Scale Free – Degree frequency distribution
Slide 27: Degree • Degree: Degree Number of other APIs that are connected to a particular API via one or more mashups Steady increase before there is a plateau • Normalized degree Mean degree divided by the maximum possible degree: degree expressed as a percentage Constant throughout Degree over time 10 8 Freeman Degree Normalized Degree 6 4 2 0 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07
Slide 28: Small Worlds • Clustering Coefficient (CC) – Extent to which nodes in a graph tends to create a unified group with many internal connections but few connections leading out of the group • Characteristic Path Length (CPL) – Measurement of the average distance required to pass from node to node (Ravid and Rafaeli, 2004) Regular Small World Random • Small World networks have – High degree of clustering – Short path lengths Source: Complex Science for a Complex World, Figure 5.6. http://epress.anu.edu.au/cs/mobile_devices/ch05s03.html
Slide 29: Small Worlds • To classify a network as Small World, compare the CC and CPL with a random network of similar density: – High degree of clustering: CCsw >> CCrandom CCsw/CCrandom > 1 – Short path lengths: CPLsw ≈ CPLrandom CPLsw/CPLrandom = 1 CCsw CCrandom CCsw/ CCrandom CPLsw CPLrandom CPLsw/ CPLrandom Dec-05 0.320 0.0125 25.5517 2.355 31.9221 0.0738 Mar-06 0.500 0.0174 28.7805 2.284 4.7222 0.4837 Jun-06 0.399 0.0230 17.3393 2.228 3.3729 0.6606 Sep-06 0.395 0.0189 20.9526 2.206 3.3870 0.6513 Dec-06 0.418 0.0176 23.7756 2.243 3.2205 0.6965 Mar-07 0.458 0.0229 19.9782 2.223 2.7038 0.8222 Jun-07 0.448 0.0199 22.4619 2.237 2.7680 0.8082 Sep-07 0.428 0.0172 24.8784 2.240 2.8557 0.7844 Dec-07 0.414 0.0144 28.7766 2.282 2.9832 0.7649
Slide 30: Small Worlds • To classify a network as Small World, compare the CC and CPL with a random network of similar density: – High degree of clustering: CCsw >> CCrandom CCsw/CCrandom > 1 – Short path lengths: CPLsw ≈ CPLrandom CPLsw/CPLrandom =< 1 CC/CCrandom CPL/CPL random 1 35 30 0.8 25 0.6 20 15 0.4 10 0.2 5 0 0 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07
Slide 31: Degree distribution: Scale Free • Scale free networks have power law degree distributions: – Frequency = b0 + Degree-b1 – Few nodes with that are highly connected hubs compared to a large number of nodes that are less connected – Network structure and dynamics are independent of network size Degree distribution of API-Affliation network (Dec 2007) 100 y = 32.165x -0.7602 R2 = 0.7245 Frequency (log) 10 1 1 10 100 1000 0.1 Degree (log)
Slide 32: 2- Degree distribution: 2-mode • The frequency distribution of the 2-mode API-mashup API- network can also be analyzed Frequency = b0 + Degree-b1 – Degree: Number of mashups created from APIs – Frequency: Number of APIs with a particular degree • In this case, would the 2-mode API-mashup distribution fit a Power Law or Long Tail distribution distribution?
Slide 33: 2- Degree distribution: 2-mode • Possible types of distributions – Power law • Where small occurrences are common and large instances are rare large number of APIs with only few mashups, compared to a small number of APIs with many mashups • Similar to markets that are dominated by a few popular products, e.g. a brick and mortar bookstore that sells large quantities of bestseller novels – Long Tail (Anderson, 2004) • Large number of low frequency occurrences that cumulatively outweigh the initial portion of high frequency occurrences when aggregated • Common in online retail: Product selection not limited by physical storage restrictions, logistics and holding costs; and consumers can easily find specific products by searching online or acting on recommendations (Brynjolfsson, Hu and Simester, 2007) Overall high volume of sales from niche products. • Mashup ecosystem is entirely virtual and has the above characteristics.
Slide 34: 2- Degree distribution: 2-mode • The frequency distribution of the 2-mode API-mashup network can also API- be analyzed – Frequency = b0 + Degree-b1 – Degree: Number of mashups created from APIs – Frequency: Number of APIs with a particular degree Degree distribution of APIs in 2-mode network Dec 2007 101 91 81 71 API Frequency 61 51 41 31 21 11 1 0 200 400 600 800 1000 1200 1400 Number of Mashups
Slide 35: 2- Degree distribution: 2-mode • Degree distribution (logged on both scales) logged scales – Note: Fitting a line could result in a slope that is too shallow (Adamic, 2000) Degree distribution of APIs in 2-mode network Dec 2007 100 API Frequency 10 1 1 10 100 1000 10000 Number of Mashups
Slide 36: 2- Degree distribution: 2-mode • Cumulative frequency distribution (logged on both scales) – Fit line to this instead Likely that it is a power law distribution: Tail is not long enough Cumulative frequency distribution of APIs at Dec 2007 1000 Culmulative frequency of APIs y = 401.05x -0.8618 R2 = 0.9865 with =< x mashups 100 10 1 1 10 100 1000 10000 0.1 x (Number of mashups)
Slide 37: 2- Degree distribution: 2-mode • Frequency = b0 + Degree-b1: Exponent decreases over time – Fewer APIs with a lot of mashups (high degree) but more APIs with less mashups.(low degree) in December 2007 compared to December 2005. – Connections becoming more mesh-like (more evenly distributed), and less hub-and-spoke like (unevenly distributed) Tail getting longer Degree distribution of APIs in 2-mode network 100 Dec-07 80 Dec-05 Log. (Dec-07) 60 Log. (Dec-05) Frequency 40 20 0 1 10 100 1000 10000 -20 Degree (log)
Slide 38: 2- Degree distribution: 2-mode • Why does the distribution change? Could be due to the effect of several forces: – Number of APIs with fewer mashups (low degree) could be increasing at a rapid rate • Easy to join network; competitors actively promote APIs – Number of APIs with more mashups (high degree) could be increasing at a slow rate • Total number of APIs that have many mashups is constrained – Both: Combination of the above two forces
Slide 39: Factors predicting API success • Measure of API success: Many mashups • Possible factors – Time First mover advantage Time: – Category Certain categories have an advantage Category: – Market concentration on entry: Monopolized vs dispersed – Rating: Rating Higher rating indicates that API has something special • Other factors – Technology compatibility (data format, protocols, authentication); licensing structure and fees
Slide 40: Ranking of Top 20 APIs Possible time and category advantages?
Slide 41: Top 20 APIs: Number of mashups
Slide 42: Market Concentration • Herfindahl Index – 0: Not concentrated, market share evenly distributed – 1: Highly concentrated and monopolized market Overall Herfindahl Index 0.38 0.36 0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07
Slide 43: Market Concentration • Category effects: Herfindahl Index – Top 8 categories with the most of APIs at Dec 2007 (>20 APIs) Herfindahl Index by Category 1 0.8 Messaging Photos 0.6 Mapping Music 0.4 Shopping Reference Internet 0.2 Search 0 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07
Slide 44: Factors predicting API success • Category effects: Herfindahl Index – Shopping, Reference and Music • Start off with one API monopolizing the market lose market share to other newer entrants; Results in an eventual less concentrated market space. – Internet and Messaging • Begin with a few APIs sharing market space one of the existing firms increases concentration firm loses share such that the market became less concentrated. – Photos, Mapping and Search • Consistent market share structure from Dec 05 to 07, but at different levels • Photos: Highly monopolized by Flicker with few other dominant APIs • Mapping: Google Maps had the largest share, but other APIs like MS Virtual Earth, Yahoo Maps and GeoNames also had significant numbers of mashups • Search: Highly dispersed by the various text, image and other search APIs from Google and Yahoo.
Slide 45: Regression Model: Time Series Mashupst = α + β0 Mashupst-1 + β1 OverallHerfindahlt-1 + β2 CategoryHerfindahlt-1 OverallHerfindahl + β3 Rating + β4 Mapping + β5 Shopping + β6 Search + β7 Internet + β8 Music + β9 Reference + β10 Photos + β11 Messaging Top 8 categories with the most number of Adjusted Std. Error of APIs > 20 R R Square R Square the Estimate .996 a .991 .991 4.837
Slide 46: Regression Model: Time Series Coefficientsa Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) -10.130 1.444 -7.014 .000 PreviousMashups 1.166 .002 .995 523.283 .000 Previous*HerfTotal 31.034 4.353 .014 7.130 .000 Previous*HerfInCat .254 .375 .001 .676 .499 Rating .150 .085 .003 1.753 .080 CatSearch .608 .470 .003 1.293 .196 CatMapping .907 .338 .005 2.686 .007 CatShopping .679 .474 .003 1.432 .152 CatInternet .027 .445 .000 .062 .951 CatMusic .041 .505 .000 .082 .935 CatReference -.005 .463 .000 -.011 .991 CatPhotos .995 .555 .003 1.793 .073 CatMessaging .262 .560 .001 .467 .640 a. Dependent Variable: CurrentMashups
Slide 47: Discussion of findings • Steady growth of mashups and APIs Not booming • Structural changes: – Few APIs with many mashups; many APIs with few or no mashups (long tail), and over time, fewer APIs with a lot of mashups but more APIs with less mashups – Overall, market is less concentrated, but exact pattern of concentration depends on specific categories Difficult to become an established player especially late in the game
Slide 48: Discussion of findings • Connections between different APIs reaching plateau Same popular APIs connected with each other Suggests compatibility limitations between APIs (functional, technology, licensing constraints) • First mover advantage Release APIs early on • Importance of category and function Certain categories might be better
Slide 49: Conclusion • Mashup ecosystem still in its infancy – Patterns exist but difficult to predict and generalize • Future Research – Case studies of individual categories or APIs – Comparison between certain groups
Slide 50: Acknowledgements Thanks to • Jason Woodard – for your guidance and support throughout this project, you made the process really enjoyable and I’ve learnt so much from you! • John Musser – for making the project possible by generously allowing us to access data from www.programmableweb.com • Darshan Santani – for helping immensely with data extraction
Slide 51: References • Adamic, Lada A. 2000. Zipf, Power-law, Pareto - A ranking tutorial. Information Dynamics Lab, HP Labs. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html#ap1 • Anderson, Chris. 2004. The Long Tail. Wired, http://www.wired.com/wired/archive/12.10/tail_pr.html • Cho, Allan. 2007. An introduction to mashups for health librarians. Journal of the Canadian Health Libraries Association / Journal de l'Association des bibliothèques de la santé du Canada 28:19-22. • Goodman, Elizabeth, and Andrea Moed. 2006. Community in Mashups: The Case of Personal Geodata. http://mashworks.net/images/5/59/Goodman_Moed_2006.pdf • Hinchcliffe, Dion. 2006a. Is IBM making enterprise mashups respectable? Enterprise Web 2.0. http://blogs.zdnet.com/Hinchcliffe/?p=49 (Accessed February 19, 2008). • Jackson, Collin, and Helen J. Wang. 2007. Subspace: secure cross-domain communication for web mashups. In Proceedings of the 16th international conference on World Wide Web, 611-620, Banff, Alberta, Canada: ACM
Slide 52: References • Kilkki, Kalevi. 2007. A practical model for analyzing long tails. First Monday 12, no. 5. http://firstmonday.org/issues/issue12_5/kilkki/index.html • Kulathuramaiyer, Narayanan. 2007. Mashups: Emerging Application Development Paradigm for a Digital Journal. Journal of Universal Computer Science 13, no. 4:531-542. • Liu, Xuanzhe, Yi Hui, Wei Sun, and Haiqi Liang. 2007. Towards Service Composition Based on Mashup. 332-339 • Mashup (web application hybrid) - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid) • O'Brien, Damien S, and Brian F Fitzgerald. 2006. Mashups, remixes and copyright law. http://eprints.qut.edu.au/archive/00004239/ • Ravid, Gilad, and Sheizaf Rafaeli. 2004. Asynchronous discussion groups as Small World and Scale Free Networks. http://www.firstmonday.dk/issues/issue9_9/ravid/index.html



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 3 (more)