The document discusses SearchMonkey, an open platform from Yahoo! that allows developers to build structured data into search results. It presents several approaches for providing structured data to SearchMonkey, including embedding RDF or microformats directly into web pages, generating a DataRSS feed from a database, extracting data via XSLT, or calling a remote web service. The document encourages developers to prototype with XSLT initially and provides resources for learning more about SearchMonkey and structured data standards.
J2EE is already the perfect solution for complex business/enterprise systems, and JSF2.x is the perfect chance to reach out to the consumer and small business market. JSF is easier to use than it's ever been before, but small businesses have different needs than larger companies and corporations. PrettyFaces is for all projects, small and large; this presentation explains why "pretty, bookmark-able URLs" are important for client-facing applications, addressing SEO optimization, and creating clean, consistent, intuitive client interactions on the web.
J2EE is already the perfect solution for complex business/enterprise systems, and JSF2.x is the perfect chance to reach out to the consumer and small business market. JSF is easier to use than it's ever been before, but small businesses have different needs than larger companies and corporations. PrettyFaces is for all projects, small and large; this presentation explains why "pretty, bookmark-able URLs" are important for client-facing applications, addressing SEO optimization, and creating clean, consistent, intuitive client interactions on the web.
How I learned to stop worrying and love the .htaccess fileRoxana Stingu
An introduction to .htaccess and what this file can do to help with SEO.
Redirects:
- Mod_alias and mod_rewrite
- Most common redirect types (domain migrations, subdomain to folder and folder renaming and how to deal with duplicate content).
Indexing & Crawling:
- Set HTTP headers for canonicals and meta robots for non-HTML files.
Website speed:
- Gzip and Deflate
- Cache control
Challenges of building a search engine like web rendering serviceGiacomo Zecchini
SMX Advanced Europe, June 2021 - With the advent of new technologies and the massive use of Javascript on the internet, search engines have started using Web Rendering Services to better understand the content of pages on the internet. What are the difficulties in building a WRS? Are tools we use every day replicating what search engines do? In this session, Giacomo will drive you on a discovery journey digging in some techy implementation details of a search engine like web rendering service building process, covering edge cases such as infinite scrolling, iframe, web component, and shadow DOM and how to approach them.
An introduction to YUI and some examples of how to use it to solve daily problems in web design. A talk given at the University in Bucharest and partly re-hashed on the flight from my Ajax Experience talk.
My presentation at BarCamp Ghent 2 (nov 29, 2008), providing a quick overview of HTML 5. Includes two detailed cases, one about local storage APIs and one about the new video element. Check http://lensco.be for more.
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...Lincoln III
PrettyFaces: SEO, Dynamic Parameters, Bookmarks, and Navigation for JSF / JSF2 - As presented at JSFSummit2009 in Orlando Florida.
Why should we use PrettyFaces?
It's possible to make a structured, consistent, API that can handle changes to logic and the schema. Sure, it seems like a good plan to dump everything out of the database today, but what are you going to do when something changes down the road? Let's have a talk about some SOLID ways to structure our APIs and keep them from breaking down the road.
This is Steve Souders's talk at Amazon which I couldn't read in it's original pptx format (http://stevesouders.com/docs/amazon-20091030.pptx) since Keynote sucks at importing. It seems to render well here.
How I learned to stop worrying and love the .htaccess fileRoxana Stingu
An introduction to .htaccess and what this file can do to help with SEO.
Redirects:
- Mod_alias and mod_rewrite
- Most common redirect types (domain migrations, subdomain to folder and folder renaming and how to deal with duplicate content).
Indexing & Crawling:
- Set HTTP headers for canonicals and meta robots for non-HTML files.
Website speed:
- Gzip and Deflate
- Cache control
Challenges of building a search engine like web rendering serviceGiacomo Zecchini
SMX Advanced Europe, June 2021 - With the advent of new technologies and the massive use of Javascript on the internet, search engines have started using Web Rendering Services to better understand the content of pages on the internet. What are the difficulties in building a WRS? Are tools we use every day replicating what search engines do? In this session, Giacomo will drive you on a discovery journey digging in some techy implementation details of a search engine like web rendering service building process, covering edge cases such as infinite scrolling, iframe, web component, and shadow DOM and how to approach them.
An introduction to YUI and some examples of how to use it to solve daily problems in web design. A talk given at the University in Bucharest and partly re-hashed on the flight from my Ajax Experience talk.
My presentation at BarCamp Ghent 2 (nov 29, 2008), providing a quick overview of HTML 5. Includes two detailed cases, one about local storage APIs and one about the new video element. Check http://lensco.be for more.
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...Lincoln III
PrettyFaces: SEO, Dynamic Parameters, Bookmarks, and Navigation for JSF / JSF2 - As presented at JSFSummit2009 in Orlando Florida.
Why should we use PrettyFaces?
It's possible to make a structured, consistent, API that can handle changes to logic and the schema. Sure, it seems like a good plan to dump everything out of the database today, but what are you going to do when something changes down the road? Let's have a talk about some SOLID ways to structure our APIs and keep them from breaking down the road.
This is Steve Souders's talk at Amazon which I couldn't read in it's original pptx format (http://stevesouders.com/docs/amazon-20091030.pptx) since Keynote sucks at importing. It seems to render well here.
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScalePatrick Chanezon
Google Developer Days Japan 2009 - Designing OpenSocial Apps For Speed and Scale
Original slides from Arne Roomann-Kurrik & Chris Chabot with a few Zen quotes and references added by me:-)
Internal training presentation about how I go about advocating Yahoo to the outside world and what gets me pretty excited about our developer offers at the moment.
An examination of the current data portability design patterns used in Social Media sites. Looking at a possible new Open Stack concept to create true plug and play interfaces for user to exchange data
Presentation on Dutch Joomla!Days 2009. Index of possibilities to exchange data between Joomla! and Flash. A plea to use more general interfaces and standards, like XML.
Building a Single Page Application using Ember.js ... for fun and profitBen Limmer
Denver Startup Week 2015 Talk. The talk is split into two sections: conceptual reasons you might choose a framework like EmberJS where convention over configuration is preferred, and a live coding demo where we build a simple EmberJS application for our up-and-coming business, Bluth's Banana Stand.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
3. What is SearchMonkey?
an open platform for using structured data to build more
useful and relevant search results
Before After
3 | http://developer.yahoo.com/searchmonkey
6. Part of the puzzle
6 | http://developer.yahoo.com/searchmonkey
7. Vocabularies
• Need to speak the same language
• I like to see girls of that... caliber.
• English, French, Spanish, Esparanto?
• URLs to the rescue
– Dublin Core (http://purl.org/dc/elements/1.1/)
– Friend of a Friend (http://xmlns.com/foaf/0.1/)
– X-Friend Network (http://gmpg.org/xfn/11/)
– … (many more)
7 | http://developer.yahoo.com/searchmonkey
8. Syntax
• Nouns, Verbs, and Adjectives, oh my!
• All phrases become lots of triples
• (Subject, Verb / Adj. / Prep. / etc, Object)
• Key / Value pairs ++
– Everything is a URL or String
– Subject doesn’t have to be the document
8 | http://developer.yahoo.com/searchmonkey
10. Decompose to triples
• I like to eat red candy
– (self, http://example.com/likeEating,
http://example.org/temp/redcandy)
– (http://example.org/temp/redcandy,
http://example.com/isColored,
http://example.org/colors/red)
– (http://example.org/temp/redcandy,
http://example.com/isInstanceOf,
http://example.org/food/candy)
• Unnamed nodes are O.K.
10 | http://developer.yahoo.com/searchmonkey
11. How to get data to SearchMonkey?
Humans see:
• name
• picture of a person
• current job
• industry, …
Computers see:
an undifferentiated
blob of HTML
Can we make
computers smarter?
11 | http://developer.yahoo.com/searchmonkey
13. How does it work?
site owners/publishers share structured data with Yahoo!.
1
site owners & third-party developers build SearchMonkey apps.
2
consumers customize their search experience with Enhanced Results or Infobars
3
Page Extraction
RDF/Microformat Markup
Acme.com’s
Web Pages
Index
DataRSS feed
Web Services
Acme.com’s
database
13 | http://developer.yahoo.com/searchmonkey
14. Innards of SearchMonkey
• You build a web-service inside our
framework
• When a search page renders
– We check which SM apps are enabled
– We call them
• 50ms for in-page
• Long time for AJAX
– They return data in our template
– We render them (and cache)
14 | http://developer.yahoo.com/searchmonkey
15. Inside SM
Developer Developer
Publisher
15 | http://developer.yahoo.com/searchmonkey
16. Data Sources: RDF and Microformats
Name Cached Open Mode Notes
Yahoo! Index yes yes Passive Old-School Y! Index data
RDFa, eRDF yes yes Passive Vocab + markup decoupled
Microformats yes yes Passive Vocab + markup coupled
DataRSS feed yes no Active Atom + metadata
XSLT no no Active Good for prototyping
Web Service no no Active Brings in remote data
16 | http://developer.yahoo.com/searchmonkey
17. Approach #1: Embedded RDF
<?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
<!DOCTYPE html PUBLIC quot;-//W3C//DTD XHTML+RDFa 1.0//EN”
quot;http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtdquot;>
<html xmlns=http://www.w3.org/1999/xhtml
xmlns:dc=http://purl.org/dc/elements/1.1/
xmlns:foaf=http://xmlns.com/foaf/0.1/
• Cached data
lang=quot;enquot; xml:lang=quot;enquot;>
<head>
• allows Enhanced Results
<title>The Amazing Home Page of Joe Smith</title>
</head>
• but not for dynamic data
<body>
<h1 property=quot;dc:titlequot;>Joe's Home Page</h1>
• Reuse existing markup
<div rel=quot;foaf:makerquot;>
• but requires site redesign
<h2 property=quot;foaf:namequot;>Joe Smith</h2>
<div rel=quot;foaf:depictionquot;
• Open approach
resource=quot;http://joesmith.org/images/jsmith.pngquot;>
<img src=quot;/images/jsmith.pngquot;
• everyone can use
alt=quot;Smiling headshot of Joequot; />
<p property=quot;dc:rightsquot;>Creative Commons
• Passive, crawled by Y!
Attribution 3.0 Unported</p>
</div>
• less bureaucracy to set up
</div>
…
17 | http://developer.yahoo.com/searchmonkey
18. Approach #2: Embedded Microformats
<div id=quot;hcard-Joe-Smithquot; class=quot;vcardquot;>
<span class=quot;fnquot;>Joe Smith</span>
<div class=quot;adrquot;>
<div class=quot;street-addressquot;>123 Murphy Avenue</div>
<span class=quot;localityquot;>Sunnyvale</span>,
• Cached data
<span class=quot;regionquot;>California</span>
<span class=quot;postal-codequot;>94086</span>
• allows Enhanced Results
</div>
<div class=quot;telquot;>(408) 555-1234</div>
• but not for dynamic data
</div>…
• Reuse existing markup
• but requires site redesign
• Open approach
• everyone can use
• Passive, crawled by Y!
• less bureaucracy to set up
18 | http://developer.yahoo.com/searchmonkey
19. Approach #3: DataRSS Feed
<?profile http://search.yahoo.com/searchmonkey-profile ?>
<feed xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot;
xsi:schemaLocation=quot;http://www.w3.org/2005/Atom ../xsd/datarss.xsdquot;
xmlns:dc=quot;http://purl.org/dc/terms/” xmlns=quot;http://www.w3.org/2005/Atomquot;
xmlns:commerce=quot;http://search.yahoo.com/searchmonkey/commerce/quot;
• Cached data
xmlns:y=quot;http://search.yahoo.com/datarss/quot;>
<id>http://local.yahoo.com/datarss/</id>
• allows Enhanced Results
<author><name>Peter Mika (pmika@yahoo-inc.com)</name></author>
• but not for dynamic data
<title>Example data feed for Local</title>
<updated>2008-07-16T04:05:06+07:00</updated>
Generate feed from DB
•
<entry>
• and maintain afterwards
<title>Parcel 104</title>
<id>http://local.yahoo.com/info-21583016-parcel-104-santa-clara</id>
• Closed approach
<updated>2008-07-16T04:05:06+07:00</updated>
<content type=quot;application/xmlquot;>
• only Yahoo! gets data
<y:adjunct version=quot;1.0quot; name=quot;com.yahoo.local”>
• Actively provide a feed
<y:item rel=quot;dc:subjectquot;>
<y:type typeof=quot;vcard:VCard commerce:Restaurant”>
•
<y:meta property=quot;commerce:hoursOfOperationquot;> coord w/Yahoo! to set up
Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat.
19 | http://developer.yahoo.com/searchmonkey
20. Approach #4: Extract with XSLT
<?xml version=quot;1.0quot;?>
<xsl:stylesheet xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0quot;>
<xsl:template match=quot;/quot;>
<adjunctcontainer>
<adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;>
<item rel=quot;rel:Photo”
• Generally not cached
resource=quot;{//div[@class='hresume']//div[@class='image']/img/@src}quot;/>
<item rel=quot;rel:Cardquot;>
• too slow, infobar only
<meta property=quot;vcard:fnquot;>
• but good for dynamic
<xsl:value-of select=quot;//div[@class='hresume']//span[contains(@class,'fn')]quot;/> data
</meta>
Scrape page with XSLT
•
<meta property=quot;vcard:titlequot;>
<xsl:value-of select=quot;//div[@class='hresume']//ul[@class='current']/liquot;/>
• operates on cleaned up
</meta>
version of the DOM
</item>
</adjunct>
• watch out for template
</adjunctcontainer>
changes
</xsl:template>
</xsl:stylesheet>
• Easy to prototype
20 | http://developer.yahoo.com/searchmonkey
21. Prototyping with XSLT
• What if I don’t have structured data?
– I don’t own the site
– I do own the site, but I want to prototype first
• Build an XSLT custom data service first
– Write some XSLT to extract the data and
transform it into DataRSS
– Mostly about finding the right XPath (use
Firebug or XPather )
– Quick to implement, but brittle
– Can’t do a good Enhanced Result
21 | http://developer.yahoo.com/searchmonkey
22. Approach #5: Call a Web Service
<?xml version=quot;1.0quot;?>
<xsl:stylesheet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0”
xmlns:h=http://www.w3.org/1999/xhtml
xmlns:y=quot;urn:yahoo:srch”
xsi:schemaLocation=quot;urn:yahoo:srch
• Generally not cached
http://api.search.yahoo.com/SiteExplorerService/V1/PageDataResponse.xsdquot;>
<xsl:template match=quot;/quot;>
• too
<adjunctcontainer xmlns:my=quot;http://example.com/ns/1.0quot;> slow, infobar only
<adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> • but good for dynamic data
<meta property=quot;my:link1quot;>
•
<xsl:value-of select=quot;//y:Result[1]/y:Urlquot;/> Call a Remote Web Service
</meta>
• allows SearchMonkey
<meta property=quot;my:result1quot;>
<xsl:value-of select=quot;//y:Result[1]/y:Titlequot;/> apps to glue together
</meta>
• can handle OpenSearch
</adjunct>
XML natively
</adjunctcontainer>
</xsl:template>
</xsl:stylesheet>
22 | http://developer.yahoo.com/searchmonkey
23. Creating an Infobar
• Infobar advantages
– Annotate someone else’s site
– Use links and images from other domains
• Mash up info from multiple sites
• Affiliate / coupon links? Hmmm…
– Can act on *, all websites
• But these apps can be annoying if poorly designed
• Key design principles
– Put something useful in the summary
– Be creative with the HTML
23 | http://developer.yahoo.com/searchmonkey
28. Your first mistake may be your last!
28 | http://developer.yahoo.com/searchmonkey
29. True ninjas leave no room for error
// Get the list of businesses. If we
// get at least one, extract the
// address and telephone number
$appNodeList = Data::xpath(quot;/*/adjunct/item[@rel='rel:Listing']quot;);
$yd = $appNodeList->item(0);
$adr = $tel = quot;”;
$nodeList = Data::xpath(quot;item[@rel='rel:Business']quot;, $yd);
if ($nodeList->length != 0) {
$nd = $nodeList->item(0);
$adr = Data::xpathString(quot;meta[@property='vcard:adr']quot;, $nd);
$tel = Data::xpathString(quot;meta[@property='vcard:tel']quot;, $nd);
}
if ($r_rating != quot;quot;) {
$ratingstr = Data::getStarsFromNum($r_rating);
if ($r_summary != quot;quot;) {
$ratingstr = $ratingstr . quot; quot; . $r_summary;
29 | http://developer.yahoo.com/searchmonkey
30. Useful conditional tricks
• Check for empty data like this:
– if (‘’==trim($var))
• Watch out for $a.’–’.$b.’-’.$c
– What happens if these variables are empty?
• You can create helper functions!
– getOutput() must return an array, but there’s no
reason not to create other functions
– Call using self::function() instead of just
function()
30 | http://developer.yahoo.com/searchmonkey
31. Development (test, debug, collaborate)
• Your two best friends: input and output
• Collaborative development
– Create a shared Y!ID for your organization
– Export and import apps from the dashboard
• Bellwethers
– Start with just one or two, for simplicity
– Once app is working, hit “autofind” and look at
all ten, see what breaks
– Always set the #1 bellwether to something that’s
high-ranking; that’s your Gallery preview
31 | http://developer.yahoo.com/searchmonkey
32. Image Helper Functions
• Data::getStars(string $data_get_path)
– i.e. Data::getStars(“smid:Jk8/review:rating”)
• Data::getStarsFromNum(float $rating)
– Must scale $rating to fall between 0-5 inclusive
• Data::getImage(string $name)
– Adds icons to your app
• Data::getImage(“information”)
• Data::getImage(“email”)
• Data::getImage(“edit”)
•…
32 | http://developer.yahoo.com/searchmonkey
33. XML functions
• NodeList Data::xpath($string query [,
DOMNode $contextnode)
– More complicated than Data::get()
– Can count, iterate, find children
– Can fetch all vcard:fn, regardless where they are
– Can find a node and grab 1st four children
• string Data::xpathString($string query [,
DOMNode $contextnode)
– Convenience function if you don’t need to do
further DOM manipulation
33 | http://developer.yahoo.com/searchmonkey
34. Infobar Design: Party like it’s 1999
• Sadly, can’t use CSS
– and the default stylesheet strips off most style
– thus lists won’t even display bullets or numbers,
you have to fake this
• Layout: use tables (remember tables?)
• Fonts: can use <font color>, <font face>,
<big>, <small>
• Make good use of images and links
• PRO TIP: Use PHP HEREDOC (<<<)
34 | http://developer.yahoo.com/searchmonkey
35. Let Infobars be Infobars
• Make use of the real estate
35 | http://developer.yahoo.com/searchmonkey
36. Let Infobars be Infobars
• Or be minimal
• But don’t do an Infobar that’s really just an
Enhanced Result in disguise
– Use the blob and summary
– Don’t use the thumbnail, key/value pairs, …
36 | http://developer.yahoo.com/searchmonkey
37. Triggering on *
• This can be annoying for general audiences
– but it’s hard to abort an infobar before 50ms
– and you can’t do this in the PHP layer if you
depend on an extractor or web service
– Data has to be provided by a feed or by
structured markup
• For specialized audiences a “*” infobar might
be ok
37 | http://developer.yahoo.com/searchmonkey
39. Triggering on *
• Trigger on structured markup
– Ex: Creative Commons Infobar
• Use feeds to annotate the URLs you want
• Instead of *, do a comma-separated list of
sites:
– www.uiuc.edu/*, www.stanford.edu/*,
www.berkeley.edu/*, www.cmu.edu/*, …
39 | http://developer.yahoo.com/searchmonkey
40. XSLT Extractors
• Use the Firebug extension for Firefox
– And Xpather, an extension for Firefox
• Typical pattern: a skeleton of DataRSS, into
which you plug some Xpath
– For more complex XSL:
• Use <xsl:template>
• <xsl:for-each> is clumsier
• Find a good ID to cling to
– Compare arxiv.org (easy) to acm.org (harder)
40 | http://developer.yahoo.com/searchmonkey
41. Examples
• Rubic’s cube
• VTA Bus
• API Monkey
• BugMeNot
• RetailMeNot
• Amazon
41 | http://developer.yahoo.com/searchmonkey
A SearchMonkey Enhanced result contains a great deal of structured data. It could have a picture, key/value pairs, deep links…This kind of information goes far beyond what normal search results give you – a title and an autoextracted summary. Where does this information come from? <number>
Likewise, an Infobar has a summary (what the user sees before the pane is expanded) and a “blob”, an area of free-form HTML. <number>
Here’s a profile page for a colleague of mine on LinkedIn. When you and I glance at the page, we see all sorts of structured information. We see pictures, contact info, names, … all sorts of items that have actual meaning.But spiders just see a blob of markup. The spider can extract some basic info, like a title (probably correct), a summary (could be good or not), and some other metadata. But for pulling structured information out of web pages, human beings beat computers hands down. So how to harvest structured data?One approach would be to make computers SMARTER, by improving their ability to do pattern recognition and natural language processing. DRAWBACKS:these sorts of AI-type features have proven to be pretty expensive and difficult to develop. I’m not smart enough to do this, so I want you to do it for me. YOU know a lot more about YOUR site than we do. even with a “dumb” approach, indexing all these billions of webpages already takes many thousands of CPU cores, crunching away. Again, very expensive.finally, we all know what happens here. The computer begins scouring information from the entire world wide web, starts learning at a geometric rate, becomes self-aware, …Search<number>
Computers become intelligent, begin to learn at a geometric rate, form SkyNet, and scour the Earth with nuclear fire. Shareholder value decreases.So we decided to go with the approach of -- keep our spider fairly dumb, and figure out different ways for people to provide us with structured data.
In this scenario, we see all the different ways that you can feed SearchMonkey with data. A real SearchMonkey app probably wouldn’t use ALL these methods. From your database / CMS, you generate web pages with HTML markup. Those web pages can contain microformats or RDF, special markup that provides semantic meaning about the data on your pages. Our crawler can extract this information, just as it does the title, the page content, the mime-type, and so on. Alternatively, from your database you can also provide us with a DataRSS feed (more on that later) that we consume and place into our index.SearchMonkey also has two ways to actively retrieve information. You can create a Page Extractor, which scrapes information from a web page. You can also call a web service to retrieve more information about a page. We’ll talk more about all these methods in the subsequent slides.
RDF is a W3C standard for providing generalized data about semantic relationships. The way to provide RDF data to SearchMonkey is to salt your pages with special markup, extra attributes that signify that the meaning of that content. For example, we can mark up an image as the DEPICTION of the PERSON who made the page. Something a human being can infer instantly, but that a computer has to be told.Data is CACHED, meaning that you can create Enhanced Result type apps (as well as infobars). This is very good. The only downside is that it depends on the page being crawled, which means it’s not good for rapidly changing data. You wouldn’t want to use this approach for sports scores in an ongoing game, for example.RDF is also an OPEN approach – just like HTML allows anyone who builds a browser to view your pages, RDF enables anyone who can build an RDF extractor to benefit from this additional semantic information.RDF is also a PASSIVE approach – unlike feeds, which we’ll talk about later, you just have to sit back and wait for Yahoo to crawl your site. No back and forth or bureaucracy required. The really nice thing about using RDF is that you get to reuse content already available on your site.
Microformats are very similar to embedded RDF, just a slightly different approach. There are a wide variety of microformats, for events, for addresses, for social relationships, and so on. For each type of microformat, we have to implement support in SearchMonkey separately. SearchMonkey supports a number of microformats, all listed in the SearchMonkey documentation. By contrast, if you use RDF, you can use any vocabulary you like.
DataRSS is the last way to provide cached data, suitable for Enhanced Results. The difference is that DataRSS is CLOSED, the data is only available to Yahoo!, via SearchMonkey. DataRSS requuires you to actively provide and maintain a feed. The feed format is Atom (a common, standard syndication format) with additional Y! metadata attached. Setting up a feed requires coordination with us, and maintenance of the feed going forward. Just like our previous microformat example, once a feed is up and running, it appears in the devtool just like any other cached data.
For more rapidly changing data, you can create a Custom Data Service that extracts data from a web page using XSLT. This data generally isn’t cached, so it’s really only appropriate for infobars. However, it can be used with more rapidly updating data. It’s EXCELLENT for testing and prototyping, before your feed or data is ready[show demo]
XSLT custom data services are excellent when there is no good structured data available, either because you don’t own the site in question, or because you just want to get a prototype out quickly without having to to change your site’s template markup. You can use these data services to mock up what is possible with SearchMonkey.As with the PHP, the XSLT is fairly simple. The “hard” part of writing the stylesheet is really just finding the right xpath expression for extracting the information you want. The other thing you need to do is pick a good vocabulary for describing the extracted data. For example, a description is a dc:description (Dublin Core description) and so on.If the page is not well-formed XHTML, have no fear, we tidy up the page ahead of time and run the XSLT on that. The tidying can fail, but only if the markup is really pathologically bad.As we mentioned before, XSLT custom data services are good for mocking up Enhanced Results, but they’re too slow in practice. For a production-quality app, you’ll need to use them in infobars.[Show demo]
Enhanced Results are designed according to a rigid visual template, with image, links, and key/value pairs all carefully controlled. This is because we want to ensure that the search result still resembles a search result. Users scan the page, and will skip right over “wild” designs. Users literally will not consciously perceive weird results – they’ll think it’s an ad and screen it out. Infobars are the opposite. When a user opens an Infobar, they are “on task” and consciously engaged with the app. This means that for Infobars, you can and should be creative with the HTML and inline CSS. You’ve got a pretty decent canvas, so use it. The other main design principle for Infobars is that the summary must have useful text or a useful link in it. If the summary is generic, the user will not even see your infobar at all. Find one good link or one good key/value pair and put it in the summary to attract the user’s attention.
Wiring up a SearchMonkey presentation app is easy. A few clicks and you have a working app.
But there’s a world of difference between a working SearchMonkey app and real, production code.
Everyone’s data will have holes in it. Use conditionals to check for whether fields are empty, and either swap in a different field or don’t show the field in the first place. If you’re missing critical data, you can abort by returning an empty array().
The most important SearchMonkey buttons are the input and output buttons. If your app isn’t displaying properly in the preview pane, the input and output buttons will tell you why.A best practice is to create a shared Y! ID for development. This Y! user will appear in the Gallery, so you should set the name to something official looking, rather than just your name. You can also export SearchMonkey code to a file and share it with other users. Bellwethers serve two purposes. First, you need them to build your app – they determine what sort of data is on screen #3 and they serve as your live preview. Second, they’re good for QA. You only need one or two to start with, especially since it might take awhile to load ten URLs at once. After your app looks good on your first bellwethers, you should expand to 10.
Make use of the image helper functions. You can use these icons in both Infobars and Enhanced Results.
Most apps only require simple Data::get() calls, but if you need to do more complicated XML manipulation, use Data::xpath() or Data::xpathString().
Either show a lot of data with an infobar (use that entire canvas)…
Or find one good link or one good key/value pair and put it in the summary to attract the user’s attention. Either way, there’s little point in creating an Infobar that follows the strict template of the Enhanced Result.
Infobars that trigger on * can be neat, but often they can be annoying. Unless the infobar really does have something useful to do on every single URL on the search results page, you should try to narrow your scope. Search<number>
Stumbleupon acts on every URL – it might be useful for people who are very gung-ho about social networking / Web 2.0 sites, but it’s less appealing for the general public. Search<number>
Screen #3 provides a clever way to abort your infobar, even if you’re triggering on *. If you can make your app depend on some structured markup (whether it’s embedded hcard or some piece of data provided by a feed), you can Failing that, you can go to Screen #2, and just apply your app to just a limited list of sites. Your app for college sites doesn’t have to trigger on * -- a finite list of sites might work.Search<number>