Big Data has shaped much of the tech innovation happening around the world today giving people immense power to make sense of large blobs of structured and unstructured data.
Join Riju Saha, Digital Excellence Head, Oracle COE at Tata Consultancy Services to decode the fundamentals of Big data and how can you build a career in this fascinating field.
The presentation touches upon issues of strategic futures, digital twins, blockchain and the development of skills in the context of investment policy.
Opportunities and challenges for the 21st century FDLP (CNI Spring 2012)James Jacobs
James Jacobs, Government Information Librarian, Stanford University Libraries
Suzanne Sears, Assistant Dean for Public Services, University of North Texas Libraries
David Walls, Preservation Librarian, US Government Printing Office
Date: Monday, April 2, 2012
Abstract:
The vast majority of all US Government documents published today are “born digital,” published electronically and available through the Internet, and will never be printed by the federal government. The lack of a systematic process for capturing, preserving, and disseminating born-digital government information challenges the ability of the Federal Depository Library Program (FDLP) in being able to provide permanent and equal access to online-only government information to all citizens. However, GPO and the FDLP community have begun to make strides on this most critical issue. This project briefing will describe several exciting initiatives currently underway to capture, preserve and provide access to born-digital government information – including GPO’s Federal Digital System (FDsys) and web harvesting initiatives, and the agency’s partnerships with Federal agencies; the CyberCemetery, CRS Report archive and robust digitization program and digital repository of the University of North Texas; and the LOCKSS-USDOCS collaborative program. These projects offer examples of how the FDLP community, in partnership and under formal agreements with GPO can work collaboratively to assure the long-term preservation of born-digital government information to “keep America informed.”
Big Data has shaped much of the tech innovation happening around the world today giving people immense power to make sense of large blobs of structured and unstructured data.
Join Riju Saha, Digital Excellence Head, Oracle COE at Tata Consultancy Services to decode the fundamentals of Big data and how can you build a career in this fascinating field.
The presentation touches upon issues of strategic futures, digital twins, blockchain and the development of skills in the context of investment policy.
Opportunities and challenges for the 21st century FDLP (CNI Spring 2012)James Jacobs
James Jacobs, Government Information Librarian, Stanford University Libraries
Suzanne Sears, Assistant Dean for Public Services, University of North Texas Libraries
David Walls, Preservation Librarian, US Government Printing Office
Date: Monday, April 2, 2012
Abstract:
The vast majority of all US Government documents published today are “born digital,” published electronically and available through the Internet, and will never be printed by the federal government. The lack of a systematic process for capturing, preserving, and disseminating born-digital government information challenges the ability of the Federal Depository Library Program (FDLP) in being able to provide permanent and equal access to online-only government information to all citizens. However, GPO and the FDLP community have begun to make strides on this most critical issue. This project briefing will describe several exciting initiatives currently underway to capture, preserve and provide access to born-digital government information – including GPO’s Federal Digital System (FDsys) and web harvesting initiatives, and the agency’s partnerships with Federal agencies; the CyberCemetery, CRS Report archive and robust digitization program and digital repository of the University of North Texas; and the LOCKSS-USDOCS collaborative program. These projects offer examples of how the FDLP community, in partnership and under formal agreements with GPO can work collaboratively to assure the long-term preservation of born-digital government information to “keep America informed.”
LIBER fostering Open Science and Knowledge DiscoveryLIBER Europe
Presentation by Kristiina Hormia Poutanen, LIBER President. Delivered at 25th Anniversary Conference of The National Repository Library of Finland
Kuopio 22th of May 2015. Content is cc-by.
Artificial intelligence governance in the Obama & Trump yearsAdam Thierer
This presentation briefly outlines how AI governance was being formulated in the United States from 2009 to 2020 during the presidencies of Barack Obama and Donald Trump. Although these two administrations differed on most policy matters, they shared a common approach to AI governance. Generally speaking, both administrations adopted a “light-touch” regulatory and industrial policy stance toward AI. Although both administrations highlighted potential areas of policy concern—safety and security issues, in particular—promoting the growth of AI sectors and technologies was prioritized over preemptively restricting them. “Soft law” mechanisms were typically tapped before hard law solutions. In this sense, AI policy in the Obama-Trump AI governance approach has been an extension of the governance vision previous administrations applied to the internet and digital commerce.
Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data ...Jonathan Gray
"Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data Infrastructures?". Working paper presented at the Open Data Research Symposium at the 3rd International Open Government Data Conference in Ottawa, on May 27th 2015. The paper draws on research undertaken as part of the EU H2020 funded ROUTE-TO-PA project.
Brief History of Content (J Gollner 2014)Joe Gollner
This presentation was first created for an opening keynote at Documation 1999 and it has evolved to reflect ongoing evolution ever since. The Brief History of Content explores how we came to look at content as a discrete entity and as something we needed to think about, manage, and perfect separately from how we conduct our routine information exchanges. Information carries content and when we are put upon to deliver content in many ways simultaneously we have no choice but to treat content separately and in a way that is more open, adaptable, portable and processable than what any single information transaction, in being concretely rooted in a specific transactional context, will ever need to be. The Brief History of Content chronicles the emergence of content technologies that now make it possible to manage and evolve content as strategic enterprise assets.
ContentMining (aka Text and Data Mining TDM) is beneficial, legal in the UK and a few other countries. Many groups in Europe are looking to make it legal there as well but there are many vested interests who oppose it.
This short presentation shows the benefits of content mining, some of the technology, and the way that it can be used and promotedby communities of practice. I urge all attendees at CopyCamp and also the wider world to press for liberalization of Copyright
Einstein published his ideas and became a pivotal element in shifting the way we think about physics - from the Newtonian model to the Quantum - in turn this changed the way we think about the world and allowed us to develop new ways of engaging with the world.
We are at a similar juncture. The development of computational technologies allows us to think about astronomical volumes of data and to make meaning of that data.
The mindshift that occurs is that “the machine is our friend”. The computer, like all machines, extends our capabilities. As a consequence the types of thinking now required in industry are those that get away from thinking like a computer and shift towards creative engagement with possibilities. Logical thinking is still necessary but it starts to be driven by imagination.
Computational thinking and data science change the way we think about defining and solving problems.
The age of creativity - which increasingly extends its impact from arts applications to business, scientific, technological, entrepreneurship, political, and other contexts.
LIBER fostering Open Science and Knowledge DiscoveryLIBER Europe
Presentation by Kristiina Hormia Poutanen, LIBER President. Delivered at 25th Anniversary Conference of The National Repository Library of Finland
Kuopio 22th of May 2015. Content is cc-by.
Artificial intelligence governance in the Obama & Trump yearsAdam Thierer
This presentation briefly outlines how AI governance was being formulated in the United States from 2009 to 2020 during the presidencies of Barack Obama and Donald Trump. Although these two administrations differed on most policy matters, they shared a common approach to AI governance. Generally speaking, both administrations adopted a “light-touch” regulatory and industrial policy stance toward AI. Although both administrations highlighted potential areas of policy concern—safety and security issues, in particular—promoting the growth of AI sectors and technologies was prioritized over preemptively restricting them. “Soft law” mechanisms were typically tapped before hard law solutions. In this sense, AI policy in the Obama-Trump AI governance approach has been an extension of the governance vision previous administrations applied to the internet and digital commerce.
Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data ...Jonathan Gray
"Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data Infrastructures?". Working paper presented at the Open Data Research Symposium at the 3rd International Open Government Data Conference in Ottawa, on May 27th 2015. The paper draws on research undertaken as part of the EU H2020 funded ROUTE-TO-PA project.
Brief History of Content (J Gollner 2014)Joe Gollner
This presentation was first created for an opening keynote at Documation 1999 and it has evolved to reflect ongoing evolution ever since. The Brief History of Content explores how we came to look at content as a discrete entity and as something we needed to think about, manage, and perfect separately from how we conduct our routine information exchanges. Information carries content and when we are put upon to deliver content in many ways simultaneously we have no choice but to treat content separately and in a way that is more open, adaptable, portable and processable than what any single information transaction, in being concretely rooted in a specific transactional context, will ever need to be. The Brief History of Content chronicles the emergence of content technologies that now make it possible to manage and evolve content as strategic enterprise assets.
ContentMining (aka Text and Data Mining TDM) is beneficial, legal in the UK and a few other countries. Many groups in Europe are looking to make it legal there as well but there are many vested interests who oppose it.
This short presentation shows the benefits of content mining, some of the technology, and the way that it can be used and promotedby communities of practice. I urge all attendees at CopyCamp and also the wider world to press for liberalization of Copyright
Einstein published his ideas and became a pivotal element in shifting the way we think about physics - from the Newtonian model to the Quantum - in turn this changed the way we think about the world and allowed us to develop new ways of engaging with the world.
We are at a similar juncture. The development of computational technologies allows us to think about astronomical volumes of data and to make meaning of that data.
The mindshift that occurs is that “the machine is our friend”. The computer, like all machines, extends our capabilities. As a consequence the types of thinking now required in industry are those that get away from thinking like a computer and shift towards creative engagement with possibilities. Logical thinking is still necessary but it starts to be driven by imagination.
Computational thinking and data science change the way we think about defining and solving problems.
The age of creativity - which increasingly extends its impact from arts applications to business, scientific, technological, entrepreneurship, political, and other contexts.
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
Published on Jun 04, 2015 by PMR
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
Many scientists have to extract many facts out the scholarly literature - to evaluate other work or to extract useful collections of facts. This shows the approach, especially for systematic reviews of animal or clinical trials
In this deck from the HPC User Forum in Milwaukee, Michael Garris from NIST presents: The National Science & Technology Council ML/AI Initiative.
"AI-enabled systems are beginning to revolutionize fields such as commerce, healthcare, transportation and cybersecurity. It has the potential to impact nearly all aspects of our society including our economy, yet its development and use come with serious technical and ethical challenges and risks. AI must be developed in a trustworthy manner to ensure reliability and safety. NIST cultivates trust in technology by developing and deploying standards, tests and metrics that make technology more secure, usable, interoperable and reliable, and by strengthening measurement science. This work is critically relevant to building the public trust of rapidly evolving AI technologies."
In contrast with deterministic rule-based systems, where reliability and safety may be built in and proven by design, AI systems typically make decisions based on data-driven models created by machine learning. Inherent uncertainties need to be characterized and assessed through standardized approaches to assure the technology is safe and reliable. Evaluation protocols must be developed and new metrics are needed to provide quantitative support to a broad spectrum of standards including data, performance, interoperability, usability, security, and privacy.
Watch the video: https://wp.me/p3RLHQ-huZ
Learn more: https://www.nist.gov/topics/artificial-intelligence
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The goal of this presentation is to allow researchers to understand the possibilities of Social Media as a research field on the fields related to NLP/IR/DM.
Der Siegeszug der Künstlichen Intelligenz und disruptiver Technologien scheint unaufhaltsam. Aber was heißt das für unsere Gesellschaft, den Arbeitsmarkt sowie ethische Grundkonstanten? Muss der Gesetzgeber tätig werden? Diesen Fragen ging unser Seminar an der TU Berlin auf den Grund.
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
This slide deck concerns online text data for machine learning, artificial intelligence, data science, and scientific research. After this talk, you’ll know who can provide online text data, what types of data are hard to get, and principal data hygiene factors.
Updated in August 2019.
Twitter Realtime Social Data @StartupFestSylvain Carle
Learn how to work with public, conversational, real-time data. This workshop will provide some perspective on data collection strategies using Twitter’s public APIs as the starting point (REST and Streaming).
Open Data: opportunities and challenges for business and governmentDan Herbert
An Oxford Brookes University Alumni Association lecture given by Dr Dan Herbert. The lecture outlines the principles of open data and explains how these affect the work of government and businesses.
LIBER Webinar: Turning FAIR Data Into RealityLIBER Europe
These slides relate to a LIBER Webinar given on 23 April 2018. Turning FAIR Data Into Reality — Progress and Plans from the European Commission FAIR Data Expert Group.
In this webinar, Simon Hodson, Executive Director of CODATA and Chair of the FAIR Data Expert Group, and Sarah Jones, Associate Director at the Digital Curation Centre and Rapporteur, reported on the Group’s progress.
Copyright Reform: EU Legislative Process & LIBER AdvocacyLIBER Europe
LIBER's Copyright & Legal Matters Working Group met in Helsinki on 7 December 2017. This presentation, outlining the EU legislative process on copyright reform and LIBER advocacy, was given at the meeting by Helena Lovegrove, LIBER's Advocacy Adviser.
Enabling the Exchange and use of Data in AgricultureLIBER Europe
This presentation by Imma Subirats was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
GDPR - Thoughts on the EU Data Protection Regulation, Research and LibrariesLIBER Europe
This presentation by Jonas Holm was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
Research Data Services and Data Collections: Library Synergies for Economic R...LIBER Europe
This presentation by Thomas Bourke was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Securing your Kubernetes cluster_ a step-by-step guide to success !
The Technology perspective- Staffan Truvé, Recorded Future
1. Robots’ Rights and the
Future of
Web Intelligence
Staffan Truvé, Ph.D.
CTO, Recorded Future
truve@recordedfuture.com
2. Who are we?
• Founded in 2009
• US-Swedish startup
• 40 persons, 50/50 US/SE
• Boston (HQ), DC, Göteborg,
• and a few more places around the globe
• Focus on web intelligence,
for governments and industry
• Backed by Google Ventures, Atlas Venture,
Balderton, IA Ventures, and I-Q-T
2
6. Today’s Discussion, Tomorrow’s News
6
Silicon Valley executives head to
Vail, Colo. next week for the
annual Pacific Crest Technology
Leadership Forum
The carrier may select partners to set
up a new carrier as early as next month
“2010 is the year when Iran will kick out
Islam. Ya Ahura we will.”
“... Dr Sarkar says the new facility will
be operational by March 2014...”
Unilever will hold their UK launch event
early next week in Manchester
“...opposition organizers
plan to meet on Thursday
to protest...”
“Excited to see Morsi
speak this weekend...”
“According to TechCrunch
China’s new 4G network will
be deployed by mid-2010”
“Strange new Russian
worm set to unleash
botnet on 4/1/2013...”
Estimated study completion date:
November 2014
…new facility will be operational by
March 2013…
...the transaction is
expected close in late
2013…
9. 250,000 Real-time Sources
10 Billion Time-tagged Facts
“Kuo expects that Apple will introduce
an iPhone 5S around June or July of
this year”
“...opposition organizers
plan to meet on Thursday
to protest...”
Drought and malnutrition hinder next
spring’s expansion plans in Kabul...
A few
minutes
from
publishing
to analysis
Inside the Web
Intelligence Machine
10. Web Intelligence – at Web Scale
• Processing 100s of millions of
documents
• Sources from all over the world, in
8 languages – English, Arabic,
Chinese x 2, Russian, Spanish,
French, Farsi
• From government sites and big
media to blogs and social media
• 10B “facts” - growing fast
• 25+ entity types
• 100+ event types
• Metrics, signals, alerts
10
13. Without text mining, the web is useless!
• No way to find stuff without search engines –
which all rely on text mining
• And all publishers realize this
• Search à Analysis
• A necessary evolution as
the web grows
• Creating new value
• Aggregation, analysis
• Enabling media criticism
13
14. Drivers / Opportunities
• Moore’s Law!
• Advances in linguistics,
algorithms, math
• Exponential growth of
content
• The volume of information on
the web is making traditional
search worthless
14
16. Restricting the right to analyze is absurd
• What is the borderline between reading and
analyzing?
• Impossible to differentiate humans from ‘robots’
• Robots must have the same rights as
human readers
16