The Guardian embraced the internet by developing an open platform and open web principles. It moved from being solely a publisher to also being a platform, opening up its content through APIs to allow third-party developers to build applications. This helped drive significant traffic growth. To support its platform ambitions and developer partners, the Guardian evolved its technical architecture to be more scalable, reliable and high performing, adopting technologies like Solr, Memcached and cloud hosting.
The document discusses innovation in education and society through technology. It covers several topics including Moore's Law and the development of the World Wide Web. Live tracking of sporting events is presented as an example of an innovative educational project combining GPS, the internet and mobile technologies. Challenges to traditional media are discussed as well as the need for media innovation through event-based projects. One such project involved live coverage of a UEFA U21 tournament through streaming, social media and mobile journalists.
The document discusses how Atlassian improved their software development processes over time to increase developer productivity and the speed at which they can build and release new features. Initially, growing complexity from technical debt slowed their development. They then instituted measures like reducing build times, improving testing, prioritizing technical debt work, and "dogfooding" their own products to streamline their processes. These changes helped boost innovation and lower the effort required for new features.
The document discusses a project called Weather for Schools that aims to involve students in scientific weather monitoring projects. The project provides weather sensor kits for schools to collect local weather data and upload it to a central portal. Students and teachers can access all of the weather data collected and use it for further analysis. The goal is to help teach subjects like computer science, geography, mathematics and physics while having students engage in real-world data collection and analysis. Over 150 schools and 6000 students have participated since the project started in 2006.
Presentation on the evolution of the custom publisher, moving from a core of print publishing, to a all-purpose content agency with specialization on listening and content marketing.
Alastair Dant, lead interactive technologist, the Guardianjoelgunter
This document discusses how news websites can use interactive content like galleries, slideshows, timelines, maps, charts and graphics. It provides examples of how interactivity assists in data visualization for election results and replaying social activity from the World Cup. The document also discusses building interactives without Flash and options for adding interactive visuals to websites using tools from Google, Tableau and Dipity.
Alastair Dant, lead interactive technologist, the Guardian pdfjoelgunter
This document discusses how news websites can use interactive content like galleries, slideshows, timelines, maps, charts and graphics. It provides examples of how interactivity assists in data visualization for election results and replaying social activity from the World Cup. The document also discusses building interactives without Flash by using dynamic graphics, audiovisual integration, multi-user support, authoring tools, packaging and cross-browser compatibility. Finally, it recommends three easy ways to add interactive visuals using Fusion Tables from Google, Tableau to embed charts, and Dipity to create timelines.
The document discusses using Drupal to build social websites and applications. It notes that Drupal has "social" features built into its core, like user profiles, commenting, and taxonomy. There are also many third-party modules that can add additional social functionality for features like sharing, Twitter integration, Facebook connectivity, dashboards, and feeds. While Drupal is flexible and extensible for building various social solutions, the speaker cautions that one needs to consider their goals, resources, and technical requirements to determine if Drupal is realistic for the project.
The document discusses innovation in education and society through technology. It covers several topics including Moore's Law and the development of the World Wide Web. Live tracking of sporting events is presented as an example of an innovative educational project combining GPS, the internet and mobile technologies. Challenges to traditional media are discussed as well as the need for media innovation through event-based projects. One such project involved live coverage of a UEFA U21 tournament through streaming, social media and mobile journalists.
The document discusses how Atlassian improved their software development processes over time to increase developer productivity and the speed at which they can build and release new features. Initially, growing complexity from technical debt slowed their development. They then instituted measures like reducing build times, improving testing, prioritizing technical debt work, and "dogfooding" their own products to streamline their processes. These changes helped boost innovation and lower the effort required for new features.
The document discusses a project called Weather for Schools that aims to involve students in scientific weather monitoring projects. The project provides weather sensor kits for schools to collect local weather data and upload it to a central portal. Students and teachers can access all of the weather data collected and use it for further analysis. The goal is to help teach subjects like computer science, geography, mathematics and physics while having students engage in real-world data collection and analysis. Over 150 schools and 6000 students have participated since the project started in 2006.
Presentation on the evolution of the custom publisher, moving from a core of print publishing, to a all-purpose content agency with specialization on listening and content marketing.
Alastair Dant, lead interactive technologist, the Guardianjoelgunter
This document discusses how news websites can use interactive content like galleries, slideshows, timelines, maps, charts and graphics. It provides examples of how interactivity assists in data visualization for election results and replaying social activity from the World Cup. The document also discusses building interactives without Flash and options for adding interactive visuals to websites using tools from Google, Tableau and Dipity.
Alastair Dant, lead interactive technologist, the Guardian pdfjoelgunter
This document discusses how news websites can use interactive content like galleries, slideshows, timelines, maps, charts and graphics. It provides examples of how interactivity assists in data visualization for election results and replaying social activity from the World Cup. The document also discusses building interactives without Flash by using dynamic graphics, audiovisual integration, multi-user support, authoring tools, packaging and cross-browser compatibility. Finally, it recommends three easy ways to add interactive visuals using Fusion Tables from Google, Tableau to embed charts, and Dipity to create timelines.
The document discusses using Drupal to build social websites and applications. It notes that Drupal has "social" features built into its core, like user profiles, commenting, and taxonomy. There are also many third-party modules that can add additional social functionality for features like sharing, Twitter integration, Facebook connectivity, dashboards, and feeds. While Drupal is flexible and extensible for building various social solutions, the speaker cautions that one needs to consider their goals, resources, and technical requirements to determine if Drupal is realistic for the project.
In 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched the search for all their European websites to a Apache Solr based solution.
Now their customers benefit from an incredibly fast and feature rich solution that is much more than just a search and has become a valuable sales-driving tool for Panasonic. Features like relevancy manipulation, autosuggest, contextual filtering for properties like color or product category were implemented under not the most ideal circumstances mainly that there was no access to structured data. The search was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test.
This document summarizes a conference workshop about knowledge sharing. It discusses how knowledge hubs can facilitate sharing through communities of practice with over 80,000 members and 75,000 monthly visits. The workshop highlighted examples of collaborative learning through social media like Twitter chats and Local-pedia wikis. It emphasized capturing learning from conferences and distributing guidance through online channels to more broadly disseminate information.
An introduction to the core concepts of open science and Science 2.0 for informatics grad students. Originally presentation Feb. 18, 20100 to the University of Pittsburgh.
You’ve used all the server-side caching tricks in the book: memcache, APC, database cache and so on to squeeze every millisecond out, and now your site is as fast as it will ever get. Well guess again!
These technologies are caching and creating the HTML which, if they done correctly, is only 10 – 20% of the user response time, so there is a lot of room for improvement. Learn how to optimize your JavaScript, CSS, Images, Cookies and a whole slew of other things that make frontend caching a magical place.
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...The Guardian Open Platform
Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content.
This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy.
We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.
This document summarizes a presentation about MontySolr, an extension that allows embedding CPython in Solr. It was created by Roman Chyla of CERN to connect Python and Java applications without compromises. MontySolr uses JCC to embed a Python interpreter in Java, allowing Python code to interface with Solr. This provides a robust, tested integration that works for any Python or C/C++ application and leverages the strengths of both Solr and Invenio.
SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest fulltext repository for the fulltext papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two.
The document discusses various topics related to trends in mobile technology and the internet. It covers the growth of internet-connected TVs and online video sharing platforms like YouTube. Statistics are presented on YouTube's rapid growth and daily video uploads. The potential for companies to create their own video channels on platforms like Yubby is explored. The document also examines how people spend their time between TV, radio, the internet, newspapers and magazines, and the revenues generated from advertising on each medium.
The document discusses Living Labs and the European Network of Living Labs (ENoLL). It describes Living Labs as user-driven open innovation ecosystems where users help develop new technologies, products, and services. ENoLL aims to foster collaboration between public organizations, businesses, and users to accelerate innovation and address global challenges through open innovation. The network promotes cooperation between its members and helps position them internationally.
Onde KH? (where to poop?) Pitch Keynote at SWRIOBruno Marinho
The document proposes an app that helps users locate nearby restrooms and share reviews. It would use one's location to show the best restroom option and allow adding/rating restrooms. Users could download different versions for various phones. Revenue would come from toilet paper brand ads and mobile games. It would compete with similar apps but aims to be simpler. The team needs funding to create stickers/ads, hire help, and cover hosting costs to launch before competitors.
Emerge no cenário internacional uma nova tendência em favor do intercâmbio extremo de informações. Sob a luz do open mind principle, instituições públicas e privadas de todo o mundo engrossam o coro em busca de novos significados de dados existentes. Governos de todo mundo, influenciados pela onda da Web 2.0, fomentam progressivamente a construção de mashups a partir de suas bases. Batizado de Open Data, este movimento chegou ao Brasil e tem ganhado força na esfera pública. Como desenvolvedores de software, toda esta onda nos trás um sem número de oportunidades, econômicas, políticas e sociais. Por um lado, usando de nosso know how técnico, temos a chance de criar soluções que agregem transparência às ações políticas, aproximem a sociedade da gestão pública e viabilizem o exercício de uma verdadeira cidadania digital. Por outro lado, trata-se de um amplo leque de novas oportunidades de negócio que se apresenta. Nesta palestra, falaremos um pouco da filosofia OpenData e OpenGovData, apresentaremos iniciativas da comunidade hacker, discutiremos as propostas do governo e analisaremos algumas técnicas e tecnologias que temperam o caminho da abertura de dados, do scraping à semantic web.
Social media presents both security risks and opportunities for businesses. While it can expose confidential information if not managed carefully, 90% of sales now come from word-of-mouth or digital promotion. Most Australian adults use Facebook or LinkedIn, and many follow brands and research companies via social media. For accountants and financial professionals, social networks like Twitter and LinkedIn can help with networking, recruiting, and developing business if used properly while maintaining confidentiality and professionalism. Managing security risks requires strong policies, education, and monitoring of employee social media use.
The document discusses whether media queries can help make websites responsive to different devices. It argues that media queries alone are not enough and that a mobile-first approach is needed. Key points covered include using responsive images, designing for mobile sizes first before larger screens, and combining media queries with device detection. The presentation provides examples of how to implement responsive design techniques.
Beyond the Encylcopedia: The Frontiers of Free KnowledgeErikMoeller
The document discusses opportunities for expanding free knowledge and culture on Wikimedia projects and beyond. It analyzes the characteristics of successful free culture efforts like appropriate technology, small work units, and volunteer gratification. Wikimedia projects are assessed in terms of these factors, finding opportunities but also difficulties to overcome through improvements to technology, processes, funding, and inclusion. New projects are proposed to address gaps in structured data, real-time tools, physical spaces, and content types like designs and practices.
The document discusses the Guardian Open Platform which provides access to the Guardian's content through an API. The API offers rights-cleared content in three tiers of access and has been successful in increasing developer happiness, productivity, and the internal usage of the API. However, the document notes that more can still be done to improve external data references and reduce the gap between self-service and high-touch support. It promotes further opening of the Guardian's content creation process, rights management, and commercial models.
This document is a presentation about New York City's use of APIs and open data. It discusses how NYC has opened up access to city data and services through APIs to foster civic innovation. It notes that NYC now has over 750 public datasets available via API. The presentation encourages other cities to follow NYC's example in unlocking their data and services through APIs to engage citizens and developers.
This document discusses the importance of networks for journalists in the digital age. It provides examples of print, magazine, broadcast, and online journalists who have leveraged networks successfully. Networks are key for finding contacts, being findable by contacts, distributing work, and building trust and relationships. The document encourages establishing a presence on Twitter, sharing links on Delicious, and using RSS to link accounts. It frames journalism as making the hidden findable, giving voice to the voiceless, connecting communities, and verifying information. Building networks is emphasized as the first step for this online journalism module.
This webinar provided an overview of social media strategies and success stories for the automotive industry. It discussed key industry statistics showing growth in social media and provided examples of how automotive companies are using social platforms. The webinar outlined several success stories of automotive brands that saw increases in website traffic and sales by engaging customers on social media. It concluded with tips for automotive marketers, emphasizing the importance of listening to customers and committing ongoing resources to see results from social media strategies.
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin
Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation and filtering. This session shows how to use facetting to quickly get an understanding of the fields in your document. It will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features.
Configure
Presented by Markus Klose, Search + Big Data Consultant SHI Elektronische Medien GmbH at Lucene/Solr Revolution 2013 Dublin
Kibana4Solr is search-driven, scalable, browser based and extremely user friendly (also for non-technical users). Logs are everywhere. Any device, system or human can potentially produce a huge amount of information saved in logs. The amount of available logs and their semi-structured nature make a meaningful processing in real-time quite a difficult task. Thus, valuable business insights stored in logs might be not found. Kibana4Solr is a search-driven approach to handle that challenge. It offers user-friendly and browser-based dashboard which can be easily customized to particular needs. In the session the Kibana4Solr will be introduced. Some light will be shed on the architectural features of Kibana4Solr. Some ideas will be given in terms of possible business uses cases. And finally a live demo of Kibana4Solr will be shown.
Configure
More Related Content
Similar to Keynote: from publisher to platform, How The Guardian Embraced the Internet using Content, Search, and Open Source - By Stephen Dunn
In 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched the search for all their European websites to a Apache Solr based solution.
Now their customers benefit from an incredibly fast and feature rich solution that is much more than just a search and has become a valuable sales-driving tool for Panasonic. Features like relevancy manipulation, autosuggest, contextual filtering for properties like color or product category were implemented under not the most ideal circumstances mainly that there was no access to structured data. The search was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test.
This document summarizes a conference workshop about knowledge sharing. It discusses how knowledge hubs can facilitate sharing through communities of practice with over 80,000 members and 75,000 monthly visits. The workshop highlighted examples of collaborative learning through social media like Twitter chats and Local-pedia wikis. It emphasized capturing learning from conferences and distributing guidance through online channels to more broadly disseminate information.
An introduction to the core concepts of open science and Science 2.0 for informatics grad students. Originally presentation Feb. 18, 20100 to the University of Pittsburgh.
You’ve used all the server-side caching tricks in the book: memcache, APC, database cache and so on to squeeze every millisecond out, and now your site is as fast as it will ever get. Well guess again!
These technologies are caching and creating the HTML which, if they done correctly, is only 10 – 20% of the user response time, so there is a lot of room for improvement. Learn how to optimize your JavaScript, CSS, Images, Cookies and a whole slew of other things that make frontend caching a magical place.
From Publisher To Platform: How The Guardian Used Content, Search, and Open S...The Guardian Open Platform
Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content.
This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy.
We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.
This document summarizes a presentation about MontySolr, an extension that allows embedding CPython in Solr. It was created by Roman Chyla of CERN to connect Python and Java applications without compromises. MontySolr uses JCC to embed a Python interpreter in Java, allowing Python code to interface with Solr. This provides a robust, tested integration that works for any Python or C/C++ application and leverages the strengths of both Solr and Invenio.
SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest fulltext repository for the fulltext papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two.
The document discusses various topics related to trends in mobile technology and the internet. It covers the growth of internet-connected TVs and online video sharing platforms like YouTube. Statistics are presented on YouTube's rapid growth and daily video uploads. The potential for companies to create their own video channels on platforms like Yubby is explored. The document also examines how people spend their time between TV, radio, the internet, newspapers and magazines, and the revenues generated from advertising on each medium.
The document discusses Living Labs and the European Network of Living Labs (ENoLL). It describes Living Labs as user-driven open innovation ecosystems where users help develop new technologies, products, and services. ENoLL aims to foster collaboration between public organizations, businesses, and users to accelerate innovation and address global challenges through open innovation. The network promotes cooperation between its members and helps position them internationally.
Onde KH? (where to poop?) Pitch Keynote at SWRIOBruno Marinho
The document proposes an app that helps users locate nearby restrooms and share reviews. It would use one's location to show the best restroom option and allow adding/rating restrooms. Users could download different versions for various phones. Revenue would come from toilet paper brand ads and mobile games. It would compete with similar apps but aims to be simpler. The team needs funding to create stickers/ads, hire help, and cover hosting costs to launch before competitors.
Emerge no cenário internacional uma nova tendência em favor do intercâmbio extremo de informações. Sob a luz do open mind principle, instituições públicas e privadas de todo o mundo engrossam o coro em busca de novos significados de dados existentes. Governos de todo mundo, influenciados pela onda da Web 2.0, fomentam progressivamente a construção de mashups a partir de suas bases. Batizado de Open Data, este movimento chegou ao Brasil e tem ganhado força na esfera pública. Como desenvolvedores de software, toda esta onda nos trás um sem número de oportunidades, econômicas, políticas e sociais. Por um lado, usando de nosso know how técnico, temos a chance de criar soluções que agregem transparência às ações políticas, aproximem a sociedade da gestão pública e viabilizem o exercício de uma verdadeira cidadania digital. Por outro lado, trata-se de um amplo leque de novas oportunidades de negócio que se apresenta. Nesta palestra, falaremos um pouco da filosofia OpenData e OpenGovData, apresentaremos iniciativas da comunidade hacker, discutiremos as propostas do governo e analisaremos algumas técnicas e tecnologias que temperam o caminho da abertura de dados, do scraping à semantic web.
Social media presents both security risks and opportunities for businesses. While it can expose confidential information if not managed carefully, 90% of sales now come from word-of-mouth or digital promotion. Most Australian adults use Facebook or LinkedIn, and many follow brands and research companies via social media. For accountants and financial professionals, social networks like Twitter and LinkedIn can help with networking, recruiting, and developing business if used properly while maintaining confidentiality and professionalism. Managing security risks requires strong policies, education, and monitoring of employee social media use.
The document discusses whether media queries can help make websites responsive to different devices. It argues that media queries alone are not enough and that a mobile-first approach is needed. Key points covered include using responsive images, designing for mobile sizes first before larger screens, and combining media queries with device detection. The presentation provides examples of how to implement responsive design techniques.
Beyond the Encylcopedia: The Frontiers of Free KnowledgeErikMoeller
The document discusses opportunities for expanding free knowledge and culture on Wikimedia projects and beyond. It analyzes the characteristics of successful free culture efforts like appropriate technology, small work units, and volunteer gratification. Wikimedia projects are assessed in terms of these factors, finding opportunities but also difficulties to overcome through improvements to technology, processes, funding, and inclusion. New projects are proposed to address gaps in structured data, real-time tools, physical spaces, and content types like designs and practices.
The document discusses the Guardian Open Platform which provides access to the Guardian's content through an API. The API offers rights-cleared content in three tiers of access and has been successful in increasing developer happiness, productivity, and the internal usage of the API. However, the document notes that more can still be done to improve external data references and reduce the gap between self-service and high-touch support. It promotes further opening of the Guardian's content creation process, rights management, and commercial models.
This document is a presentation about New York City's use of APIs and open data. It discusses how NYC has opened up access to city data and services through APIs to foster civic innovation. It notes that NYC now has over 750 public datasets available via API. The presentation encourages other cities to follow NYC's example in unlocking their data and services through APIs to engage citizens and developers.
This document discusses the importance of networks for journalists in the digital age. It provides examples of print, magazine, broadcast, and online journalists who have leveraged networks successfully. Networks are key for finding contacts, being findable by contacts, distributing work, and building trust and relationships. The document encourages establishing a presence on Twitter, sharing links on Delicious, and using RSS to link accounts. It frames journalism as making the hidden findable, giving voice to the voiceless, connecting communities, and verifying information. Building networks is emphasized as the first step for this online journalism module.
This webinar provided an overview of social media strategies and success stories for the automotive industry. It discussed key industry statistics showing growth in social media and provided examples of how automotive companies are using social platforms. The webinar outlined several success stories of automotive brands that saw increases in website traffic and sales by engaging customers on social media. It concluded with tips for automotive marketers, emphasizing the importance of listening to customers and committing ongoing resources to see results from social media strategies.
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin
Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation and filtering. This session shows how to use facetting to quickly get an understanding of the fields in your document. It will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features.
Configure
Presented by Markus Klose, Search + Big Data Consultant SHI Elektronische Medien GmbH at Lucene/Solr Revolution 2013 Dublin
Kibana4Solr is search-driven, scalable, browser based and extremely user friendly (also for non-technical users). Logs are everywhere. Any device, system or human can potentially produce a huge amount of information saved in logs. The amount of available logs and their semi-structured nature make a meaningful processing in real-time quite a difficult task. Thus, valuable business insights stored in logs might be not found. Kibana4Solr is a search-driven approach to handle that challenge. It offers user-friendly and browser-based dashboard which can be easily customized to particular needs. In the session the Kibana4Solr will be introduced. Some light will be shed on the architectural features of Kibana4Solr. Some ideas will be given in terms of possible business uses cases. And finally a live demo of Kibana4Solr will be shown.
Configure
The document describes Twitter's search architecture. It discusses how Twitter uses modified versions of Lucene called Earlybird to build real-time and archive search indexes. The real-time indexes are partitioned and replicated across clusters. New tweets are continuously added and searchable with low latency. Archive indexes contain older tweets on HDFS and are optimized for throughput over low latency. The system uses an analyzer to preprocess tweets before indexing and a service called the Blender to merge search results.
Building Client-side Search Applications with Solrlucenerevolution
Presented by Daniel Beach, Search Application Developer, OpenSource Connections
Solr is a powerful search engine, but creating a custom user interface can be daunting. In this fast paced session I will present an overview of how to implement a client-side search application using Solr. Using open-source frameworks like SpyGlass (to be released in September) can be a powerful way to jumpstart your development by giving you out-of-the box results views with support for faceting, autocomplete, and detail views. During this talk I will also demonstrate how we have built and deployed lightweight applications that are able to be performant under large user loads, with minimal server resources.
Integrate Solr with real-time stream processing applicationslucenerevolution
The document discusses integrating Apache Storm with Apache Solr for real-time stream processing applications. It provides an example of building a Storm topology that listens to click events from a URL shortener, counts the frequency of pages in a time window, ranks the top sites, and persists the results to Solr for visualization. The key points covered are using Spring to simplify building Storm topologies, integrating with Solr for indexing and search, and unit testing streaming data providers.
Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you'll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You'll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.
Presented by Rafal Kuć, Consultant and Software engineer, , Sematext Group, Inc.
Even though Solr can run without causing any troubles for long periods of time it is very important to monitor and understand what is happening in your cluster. In this session you will learn how to use various tools to monitor how Solr is behaving at a high level, but also on Lucene, JVM, and operating system level. You'll see how to react to what you see and how to make changes to configuration, index structure and shards layout using Solr API. We will also discuss different performance metrics to which you ought to pay extra attention. Finally, you'll learn what to do when things go awry - we will share a few examples of troubleshooting and then dissect what was wrong and what had to be done to make things work again.
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
In a recent project with the United States Patent and Trademark Office, Opensource Connections was asked to prototype the next generation of patent search - using Solr and Lucene. An important aspect of this project was the implementation of BRS, a specialized search syntax used by patent examiners during the examination process. In this fast paced session we will relate our experiences and describe how we used a combination of Parboiled (a Parser Expression Grammar [PEG] parser), Lucene Queries and SpanQueries, and an extension of Solr's QParserPlugin to build BRS search functionality in Solr. First we will characterize the patent search problem and then define the BRS syntax itself. We will then introduce the Parboiled parser and discuss various considerations that one must make when designing a syntax parser. Following this we will describe the methodology used to implement the search functionality in Lucene/Solr. Finally, we will include an overview our syntactic and semantic testing strategies. The audience will leave this session with an understanding of how Solr, Lucene, and Parboiled may be used to implement their own custom search parser.
The document discusses the benefits of meditation for reducing stress and anxiety. Regular meditation practice can help calm the mind and body by lowering heart rate and blood pressure. Making meditation a part of a daily routine, even if just 10-15 minutes per day, can offer improvements to mood, focus, and overall well-being over time.
Enhancing relevancy through personalization & semantic searchlucenerevolution
I. The document discusses how CareerBuilder uses Solr for search at scale, handling over 1 billion documents and 1 million searches per hour across 300 servers.
II. It then covers traditional relevancy scoring in Solr, which is based on TF-IDF, as well as ways to boost documents, fields, and terms.
III. Advanced relevancy techniques are described, including using custom functions to incorporate domain-specific knowledge into scoring, and context-aware weighting of relevancy parameters. Personalization and recommendation approaches are also summarized, including attribute-based and collaborative filtering methods.
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
Building real-time notification systems is often limited to basic filtering and pattern matching against incoming records. Allowing users to query incoming documents using Solr's full range of capabilities is much more powerful. In our environment we needed a way to allow for tens of thousands of such query subscriptions, meaning we needed to find a way to distribute the query processing in the cloud. By creating in-memory Lucene indices from our Solr configuration, we were able to parallelize our queries across our cluster. To achieve this distribution, we wrapped the processing in a Storm topology to provide a flexible way to scale and manage our infrastructure. This presentation will describe our experiences creating this distributed, real-time inverted search notification framework.
Solr's Admin UI - Where does the data come from?lucenerevolution
Like many Web-Applications in the past, the Solr Admin UI up until 4.0 was entirely server based. It used separate code on the server to generate their Dashboards, Overviews and Statistics. All that code had to be maintained and still ... you weren't really able to use that kind of data for the things you needed it for. It was wrapped into HTML, most of the time difficult to extract and changed the structure from time to time w/o announcement. After a short look back, we're going to look into the current state of the Solr Admin UI - a client-side application, running completely in your browser. We'll see how it works, where it gets its data from and how you can get the very same data and wire that into your own custom applications, dashboards and/oder monitoring systems.
Schemaless Solr allows documents to be indexed without pre-configuring fields in the schema. As documents are indexed, previously unknown fields are automatically added to the schema with inferred field types. This is implemented using Solr's managed schema, field value class guessing to infer types, and automatic schema field addition. The schema and newly added fields can be accessed via the Schema REST API, and the schema can be modified at runtime when configured as mutable. However, schemaless mode has limitations such as single field analyses and no way to change field types after initial inference.
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
This document discusses high performance JSON search and relational faceted browsing using Lucene. It introduces SIREn, a Lucene plugin for indexing and searching JSON documents with a nested data model. SIREn uses tree labeling techniques to represent the JSON document structure and enable both full-text and structural queries. It also allows for relational faceted browsing across multiple record collections through pivot navigation and query rewriting. While BlockJoin supports some nested data in Lucene, SIREn has better scalability through its compression techniques and more flexibility through its schema-agnostic approach.
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
In this session we will show how to build a text classifier using the Apache Lucene/Solr with libSVM libraries. We classify our corpus of job offers into a number of predefined categories. Each indexed document (a job offer) then belongs to zero, one or more categories. Known machine learning techniques for text classification include naïve bayes model, logistic regression, neural network, support vector machine (SVM), etc. We use Lucene/Solr to construct the features vector. Then we use the libsvm library known as the reference implementation of the SVM model to classify the document. We construct as many one-vs-all svm classifiers as there are classes in our setting, then using the Hadoop MapReduce Framework we reconcile the result of our classifiers. The end result is a scalable multi-class classifier. Finally we outline how the classifier is used to enrich basic solr keyword search.
Faceted search is a powerful technique to let users easily navigate the search results. It can also be used to develop rich user interfaces, which give an analyst quick insights about the documents space. In this session I will introduce the Facets module, how to use it, under-the-hood details as well as optimizations and best practices. I will also describe advanced faceted search capabilities with Lucene Facets.
Presented by Shai Erera, Researcher, IBM
Lucene's arsenal has recently expanded to include two new modules: Index Sorting and Replication. Index sorting lets you keep an index consistently sorted based on some criteria (e.g. modification date). This allows for efficient search early-termination as well as achieve better index compression. Index replication lets you replicate a search index to achieve high-availability, fault tolerance as well as take hot index backups. In this talk we will introduce these modules, discuss implementation and design details as well as best practices.
As part of their work with large media monitoring companies, Flax has developed a technique for applying tens of thousands of stored Lucene queries to a document in under a second. We'll talk about how we built intelligent filters to reduce the number of actual queries applied and how we extended Lucene to extract the exact hit positions of matches, the challenges of implementation, and how it can be used, including applications that monitor hundreds of thousands of news stories every day.
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
Presented by Xavier Sanchez Loro, Ph.D, Trovit Search SL
This session aims to explain the implementation and use case for spellchecking in Trovit search engine. Trovit is a classified ads search engine supporting several different sites, one for each on country and vertical. Our search engine supports multiple indexes in multiple languages, each with several millions of indexed ads. Those indexes are segmented in several different sites depending on the type of ads (homes, cars, rentals, products, jobs and deals). We have developed a multi-language spellchecking system using solr and lucene in order to help our users to better find the desired ads and avoid the dreaded 0 results as much as possible. As such our goal is not pure orthographic correction, but also suggestion of correct searches for a certain site.
The document discusses how Intelligent Software Solutions (ISS) uses Apache Solr and natural language processing (NLP) techniques to help their customers analyze large amounts of unstructured data. ISS develops innovative solutions for government customers dealing with thousands of data sources. Their approach involves acquiring content, indexing it in Solr for search and discovery, semantically enriching it using NLP techniques like named entity recognition and clustering, and presenting focused "data perspectives" for analysis. They leverage multiple NLP approaches like GATE/Gazetteers and OpenNLP/machine learning to complement each other's strengths in finding both known and unknown relevant information.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Keynote: from publisher to platform, How The Guardian Embraced the Internet using Content, Search, and Open Source - By Stephen Dunn
1. From publisher to platform:
How the Guardian embraced the internet
using content, search, and Open Source
Stephen Dunn, Guardian News and Media
stephen.dunn@guardian.co.uk, 25th May, 2011
Twitter: @cuica, @openplatform
Thursday, 26 May 2011
2. 1
From publisher to platform
How the Guardian embraced the Internet using
content, search, and Open Source
Stephen Dunn, Guardian News and Media
2
Thursday, 26 May 2011
4. We started a long
time ago:
Thursday, 26 May 2011
5. Keyword page
Live blogs
Apps Mobile site
Twitter updates
Swine flu Comment
Content partnerships
Newspapers
Audio
Video Open platform API
Thursday, 26 May 2011
6. To secure the financial and editorial
To secure the financial and editorial independence
independence of the Guardian in perpetuity.
To promote freedom in thein perpetuity
of the Guardian press and liberal
journalism globally.
To promote freedom in the press and liberal
To become the world's leading liberal voice.
journalism globally
Thursday, 26 May 2011
9. 1. Permanent
http://www.flickr.com/photos/fstorr/
• “A cool URI is one that does not change” Tim Berners-Lee 1998
• 1.5 million resources redirected to new scheme
9
Thursday, 26 May 2011
10. 2. Addressable
★ Resources are “about” something - ready for the
social web.
★ We live in “the age of point-at-things” (Coates 2005)
10
Thursday, 26 May 2011
11. 3. Discoverable
★ Multiple routes
to content
★ Tagging drives
discovery
11
Thursday, 26 May 2011
15. Site traffic growth Final Release
Unique Users
30,000,000
26,250,000 First release
22,500,000
Unique Users
Pre - project
18,750,000
15,000,000
11,250,000
40M
7,500,000
3,750,000
Sep 2005 Oct 2006 Nov 2007 Dec 2008
15
Thursday, 26 May 2011
21. ...“How I
stopped
worrying about
my website and
learned to love
the whole
internet.”
Matt McAlister
21
Thursday, 26 May 2011
22. The Open Strategy
OPEN IN OPEN OUT
Bring in data and apps Enable partners to
from the Internet build applications
using Guardian
content and services
for other platforms
22
Thursday, 26 May 2011
24. "Our most interesting experiments lie in combining
what we know with the experience, opinions and
expertise of the people who want to participate
rather than passively receive.”
24
Thursday, 26 May 2011
34. Jack Shenker
“The Guardian alongside Al Jazeera was the one news source
that everybody on the streets in Tahrir - not just in Cairo but in
surrounding cities and major centers of revolutionary activity -
that people were talking about.”
34
Thursday, 26 May 2011
35. The Open Strategy
OPEN IN OPEN OUT
Bring in data and apps Enable partners to
from the Internet build applications
using Guardian
content and services
for other platforms
35
22
Thursday, 26 May 2011
37. The suite of services enabling
partners to build applications with
the Guardian
37
Thursday, 26 May 2011
38. OPEN IN OPEN OUT
Bring in data and apps Enable partners to
from the Internet build applications
using Guardian
content and services
for other platforms
38
22
Thursday, 26 May 2011
39. CONTENT API DATA STORE POLITICS API
A service for A directory of Open database
selecting and useful data of candidates,
collecting curated by voting records,
content from Guardian constituencies,
the Guardian editors election results,
for re-use live data on
election day
Thursday, 26 May 2011
57. 3 Tiers of access
3 Revenue models
Keyless: Take our headlines. You keep associated
revenues.
Approved: Take our full article content, but with an
advert. Guardian keeps ad revenue, you keep rest-of-
page revenue.
Bespoke: Take, reformat, augment our content
Revenue model to be negotiated. Combination of
Media, Fees, Downloads.
57
Thursday, 26 May 2011
59. What this means
Open Out: Developers can now access full content APIs on
demand with keys post-approved
Platform is positioned as a place to do business
So rapid scalability, reliability and performance are now core
requirements
59
Thursday, 26 May 2011
60. OPEN IN OPEN OUT
Bring in data and Allow partners to
apps from the build applications
internet using Guardian
content and
services for other
platforms
Thursday, 26 May 2011
61. Simple REST/HTTP
MICROAPPS framework allows lightweight
development
A framework for
integrating 3rd party Applications proxied for
applications into performance
guardian.co.uk
Apps generally hosted in the
cloud, allows hot deployment
into production
61
Thursday, 26 May 2011
62. MICROAPPS
A framework for
integrating 3rd party
applications into
guardian.co.uk
62
Thursday, 26 May 2011
67. From publisher to
platform
Seeking massive growth, but no longer only
broadcasting content on the website
User/partner engagement & contribution on
Journalism
data
software
applications
revenue and ads
Support developers and partners with data and APIs,
need scalability, reliability, speed
67
Thursday, 26 May 2011
68. Evolving the
architecture
68
Thursday, 26 May 2011
69. Web server Web server Web server
App server App server App server
Memcached (added later)
Oracle
CMS
Thursday, 26 May 2011
70. Web server Web server Web server
Why RDBMS?
App server App server App server
5 years ago, fewer alternatives
Memcached
Understand operations procedures
Can easily recruit DBAs / devs
Oracle
Developer/ops tools
Business critical system: a safe choice
CMS
Thursday, 26 May 2011
78. We chose Solr/Lucene
Can perform complex queries, including full-text search
We can change the schema with no downtime
Most queries are of similar cost
Scales very well horizontally
“Just worked” in the cloud
No strange control processes/engines
Developers just loved working with it!
78
Thursday, 26 May 2011
80. Api
Web servers
Solr
App server
Solr
Memcached
Solr
RDBMS Solr
Solr
Solr
CMS
Cloud, EC2
80
Thursday, 26 May 2011
81. What about Open In?
OPEN IN OPEN OUT
Bring in data and apps Enable partners to
from the Internet build applications
using Guardian
content and services
for other platforms
81
22
Thursday, 26 May 2011
82. Apps
Web servers
Proxy
App
App server
App
App Memcached
App
RDBMS
App
App
CMS
external hosting
app engine etc
82
Thursday, 26 May 2011
83. Core
Out
In
Web servers
Solr
Proxy
App
App server
App Solr
Memcached
App Solr
App CMS Solr
Solr
App
rdbms
Solr
App
external hosting Cloud, EC2
app engine etc
83
Thursday, 26 May 2011