The Web as a World of Avatars
ANWOT - A new Way of Thinking
By Juan Chamero, from Barcelona, Spain, as of May 30th
2016
Avatar (2009 film), from Wikipedia
Introduction
We present here the last version of Darwin Methodology initially created to “see the Web more
and better” that evolved to see the Web as a World of Avatars instead, cyber creatures that
represent our past and present ideas and thoughts and even all type of intellectual speculations
about our possible futures.
This idea is not new: it goes back along centuries diluted and hidden as archetypes and models like
“Romeo and Juliet”, “Don Quixote”, “Ulysses”, “Democracy”, “El Príncipe”, the Avatars of
Hinduism, and actually as Cyber creatures by visionaries and scientists like Stephen Hawking. Why
was it hidden for so long? Because only from very recently exist suitable cyber reservoirs to host
The ALL almost “naturally”, openly and freely: The Web.
This presentation could be considered our third e-book of the Mind to Digital Series. It has five
sections namely:
i. Darwin Methodology Last Update deals with the idea of Web avatars fundamentally the
new Darwin Ontology Conjectures to cover this revolutionary vision;
ii. Darwin in a nutshell is a synthesis of Darwin methodology in order to see the Web as
semantically structured however not enabled yet to see it as a world of avatars;
iii. Darwin Demo deals with the details of The Art Thesaurus unveiled from the Web via a
Darwin AI Mega Algorithm and presenting Darwin as ANWOT, A New Way of Thinking;
iv. Darwin Project stands for Darwin as a Project to map and unveil absolutely EVERYTHING
disperse and hidden in the Web;
v. Darwin Tests and Reflections deals with some paradox and crucial tests performed by
Darwin Methodology along the last decade and some Web examples about concepts
versus words architectures semantic search superiority (See Kentucky Woodman!).
Darwin Methodology Last Update
 Darwin Ontology Adjustments, 1 page;
 Darwin avatars buildup in 12 big “industrial steps” analogy, 4 pages;
 Darwin Methodology – To “see” the Web more and better, avatar seeds, 12 pages;
 Darwin Ontology – Conjectures, 4 pages;
Darwin in a nutshell
 Darwin in a nutshell Index, 1 page;
 Darwin Brief, 1 page;
 Darwin Methodology, 2 pages;
o Darwin Carousel, 5 pages;
o Darwin Maps Build, 3 pages;
o Darwin Big Data, 3 pages;
o Darwin Icons Meanings, 5 pages;
 The Web as seen by Darwin Methodology, 3 pages;
 Darwin Bibliography, 1 page;
Darwin Demo
 A picture is worth a thousand words, 1 page;
 Intro to Darwin Art Map, 6 pages;
 Darwin Mapping History, 9 pages;
 Darwin Semantic Search, 19 pages;
 Q&A Logic of Web Search, 9 pages;
 The Art Tree Darwin Demo, 13 pages;
 Darwin Presentation (PPT), 25 pages; see pages 284 to 308;
Darwin Project
 Darwin Teaser, 11 pages;
 Darwin HKM (PPT), 75 pages; see pages 209 to 283;
 Present and Future of Web Searching, 4 pages;
 DM Mega Algorithm, 4 pages;
 Aiware Methodology, ikAK, 3 pages;
 Semantic Pills, within a Big Data Thesaurus, 35 pages;
 HKM Synthesis, HKM in numbers, 4 pages;
Darwin Tests and Reflections
 Wikipedia avatar, 3 pages;
 Word Searching Weakness, 1 page;
 Differences between data information and knowledge, 3 pages;
 The Web for fun, “who’s on first?, 2 pages;
 Crucial questioning, 4 pages;
 Kentucky Woodman, 3 pages;
 Human Knowledge Disciplines, 13 pages;
 Words versus concepts, 5 pages;
 Mathematics seed, 7 pages;
Epilogue
We intended to depict in this e-book a semantic tour around the Web using as a “cicerone” our
Darwin Methodology. We have “seen” semantically and at our will hundreds of thousands of
Websites related to our needs of data, information, and knowledge and even of intelligence. As a
consequence of our guided e-learning we acquire a valuable cyber wisdom we want to transmit.
If we were challenged to explain shortly the rational of our alleged “acquired wisdom” we would
recommend overview our Darwin Tests and Reflections section thru a mini tour as follows:
 Wikipedia avatar: it shows us the best we can do working conventionally, at large
subjectively via real or alleged authorities;
 Word Searching Weakness: it shows the intrinsic weakness and misleading ambiguity of a
trivial search like for example “dog”;
 Differences between data, information and knowledge: in fact at present we ignore what
are these differences scientifically talking. Notwithstanding we consider the above
hierarchical sequence a strong and valuable belief. There are hundreds of alleged
authoritative versions about it such as the one commented;
 The Web for fun: we use one of the most famous Abbot & Costello “routines” who’s on
first to exemplify the semantic confusion generated by bad and/or incorrect use of words;
 Crucial questioning: here we present the hardest Artificial Intelligence experts questioning
about Darwin namely: Darwin versus Google search; successfully high impact and/or
disruptive applications uses examples; Darwin ability to work within the Dark Web;
 Kentucky woodman: a semantic analysis of the term associated to Abraham Lincoln as
avatar unveiling the Web as_is departing from Zero Knowledge;
 HK, Human Knowledge Disciplines: A whole Web Thesaurus would cover about 200
disciplines. We have arrived to that estimation that upgraded from 150 at the beginning of
the 2000 depending of what we mean by “branch of knowledge”. See a brief exploration
about it as of 2014;
 Words versus Concepts: it is the summary of a Darwin workshop seminar held in 2015
about word versus concept and their associated universes and in mind images;
 Mathematics seed: example of a Semantic Seed buildup performed by Dr Eduardo Ortiz
and its team of PhD postulants about Mathematics and tested as trustable by Darwin
agents. Dr Ortiz is emeritus professor of Mathematics and History of Mathematics at the
Imperial College of London.
Darwin Ontology Adjustments
By Juan Chamero as of February 19th 2016
The Al is mentally imagined (“in mind image”):
Keeping it in mind – Poetry By Heart, Oxford Dictionaries
Prologue
This brief document deals with crucial Cyberspace fundamental findings. Documents reviewed have been
biographies and classic essays, to my knowledge, related to the scope of our ontology namely: Galileo
Galilei, Claude Shannon, Alan Turing, Roger Penrose, Albert Einstein, Stephen Hawking, Plato, René
Descartes, John Von Neumann, Nikola Tesla, Jaime Balmes, The Tao Te Ching book, Zen writings nucleated
around Bodhidharma, The Bible: Genesis, The Apocalypses and why not something about The Pope Francis,
Saint Augustin, Teilhard de Chardin and Umberto Eco, an atypical intellectual cocktail isn’t ?!
Because these rehearses surged a new updated Darwin Ontology to “see the ALL more and better”. Perhaps
this ALL be the common mind image the intelligence cocktail components share: Rational: a) Our mind
process continuously information and knowledge; b) this help us to live more and better; c) via intuition and
knowledge humans document their acts and experiences, let’s say their knowhow and living avatars.
Stephen Hawking states that these registrations (for example in books and in the Cyberspace) would be as
important as our lives. As a preliminary thinking he suggests that the image of the whole world as_is as
today and probably seeds of our future could be expressed in the Web. Provided this assert is true we would
be less of what we really could! (And even without recurring to God).
Note: does these assertions sound a little as science fiction for you? It would be equivalent to say that via
ontologies like Darwin we are enabled to know not only the best possible truths but the absolute best ones!
And all this without creating nothing new at human or artificial Intelligence level, simple unveiling all the
pieces of truth that somehow are disperse and semi hidden in the Web: almost a paradox of negentropy!
The adjustments have been synthesized in a 4 pages document denominated “avatar_buildup”. Its first page
deals of Darwin as unveiling avatars diluted in the Web. Within this vision a Darwin outcome as the Human
Knowledge Map would be simply a Human Avatar, big one and complex but at last an avatar! The second
and half of the third page is devoted to the epilogue that resumes the Darwin Ontology adaptations
performed in order to unveil not only what is actually visible but what is actually hidden or invisible as
avatars. These adjustments close our ontology with a finishing touch! => Back
Darwin avatars buildup in 12 big “industrial steps” analogy
ANWOT, A New Way Of Thinking doc, by Juan Chamero, from Spain as of February 15
th
2016
Face avatars from Deleket
Introduction
Avatars: cyber creatures that represent and guide, generally as archetypes, our lives. If we as humans have
defined our reason for living and our relation with THE PERCIEVED ALL throughout the cognitive hierarchy
data, information, knowledge and wisdom, we may assume that the avatars are creatures that represent
and help us to make a meaningful use of this hierarchy to solve and/or “to see” meaningfully any imagined
subject. Avatars are in mind images however not all in mind images are avatars. Avatars are then creatures
that in some extent represent us, our existence, our past and what we expect for our future as well.
Let’s imagine a world without humans but with the Web space “alive” as it is now: a sort of Cyber Sea where
live trillions of in mind images and avatars! We are used to see and to understand all type of in mind images
via trillions of explanations and descriptions documented in all the existent languages and cultures. As
eventual intelligent “aliens” we may imagine well how insects behave, how illnesses evolve, what a storm is,
and even what are more abstracts and complex things like hate and love. In Hinduism an avatar is an
incarnation or deliberate descent of a deity or Supreme Being to Earth.
In Darwin Ontology it is supposed that in the Web co-exist all imaginable avatars, for instance The Pope
Francis avatar as of today would be a creature that resume absolutely EVERYTHING we as humans “see”, as
related to the pope investiture, to the Catholic church and to its avatars along time, that is to say to Jesus, to
the Saints and prophets, to all type of passions from the compassion of Jesus to the tortures of the “Santa
Inquisition”, to all types of adhesions – rejections from atheism and agnosticism to the highest apologies of
faith, nor leaving outside the human Jorge Bergoglio as a person and his family and entourage.
Darwin Ontology enable us “to see” more and better the Web, and of its creatures, something similar to the
Galileo Galilei exploration of the sky thru its telescope. It enable us to unveil not only all the disperse pieces
of information and knowledge of all avatars but also the hidden intelligence that maintain them united as
entities. Within this paradigm would also be avatars all the big problems of the humanity. Let’s imagine now
we are commissioned to build a big and complex avatar like for instance Barack Obama, the Pope Francis,
the Refugees Problem, The EU Future, the Terrorism, the Democracy Evolution, the Genre Violence, etc. Do
not discourage everything is “up there” disperse but hidden in the Web Sea!
Avatar Buildup
Step 1 – Semantic Exploration: Perform a first approach to the avatar: review the Web content with our
mind focused in finding names, traces, features, images, audios, text samples, memes, tags, collectives,
visible authorities; defining a first set of semantic axes of an hypothetic avatar “semantic seed”.
Step 2 – Semantic Resonance Exploration: Unveil the “names” of the first approach: each in mind image
has, for a given pair language – culture, a sort of “resonance name” like radio waves: asking conventional
search engines by these names they point to the best answers, in quantity and semantic quality. For
instance exploring the semantic neighborhood of these best names you may experiment significant changes
with minimum and/or negligible written or pronunciation differences. Resembling humans criteria Darwin
agents detect the best names for a given in mind image.
Step 3 – Identifying suspected Authorities realms: Pivoting and exploring websites content around
resonance names, for instance building hyperlinks versus hyperlinks matrices, will provide us raw data for
next step: namely chains (including closed loops) of meaningful related hyperlinks names semantically
weighted.
Step 4 – Unveil possible “conceptual graphs”: basically a human task: an expert or a group of experts
analyze the unveiled raw data looking for conceptual graphs to feed next step. At this step the global
structure of the avatar should be depicted and properly documented.
Step 5 – Unveil possible “semantic seeds”: basically a human task guided and aided by agents.
Step 6 – Select the best semantic seed: basically a human task. As the “hard” process, computationally
talking (+85%) begins next step, the selection must be backed up and justified as much as possible.
Step 7 – Make the semantic seed grow: there are many ways to make the semantic seeds grow “properly”,
however under the same ontology, depending of the nature, complexity and size of the avatar. One of the
simplest ways is to expand the initial names realm (the one that backed up the semantic seed unveiling),
let’s say from 50 to 100 names to a realm of a few thousands. For each semantic seed name it is unveiled
from the Web a “texton” a sort of logical huge vector built of pieces of content of the 500 to 1,000 top
websites retrieved. Darwin methodology states that within this huge sample it is meaningfully represented
the avatar, basically all its derived names and their logical probably locations within its logical semantic
skeleton. Darwin algorithms and agents make all the computations but humans are responsible of selecting
criteria of naming expansion and locations adjusting. As an outcome of this step we have our avatar
predefined but fuzzy as seen through a cloud: for instance as a set of 1.200 names related to the avatar but
still poorly structured along a conceptual graph of 500 nodes.
Step 8 – Specific Concepts unveiling: Darwin Ontology states that we humans, as a collective, document
according to a probabilistic “formula” (something like a WWD, Well Written Document formulae) using only
two kind of semantic particles: Common Words and Expressions and “specific concepts” closely related to
the in mind image we have in our minds about the main document subject. This specificity acts as a
semantic filter that aid us to see more and better the semantic skeleton of the avatar. Following our
example we arrive to a name realm of 1,500 terms and to a semantic skeleton of 450 nodes.
Step 9 – Check the Ontology Conjectures accomplishment: This is a necessary, hidden and heavy task
mostly performed by Darwin algorithms. We have to take into account that only long lasting and complex
avatars tend to structure like logical trees, like for instance Maps of Knowledge. Most avatars will have semi
arboreal structures however in part resembling directed graphs: this directionality enable us to activate
semantic ancestry and all types of parental relations via conventional search engines, for instance that
“popularity” rank tend to be higher with ancestry.
Step 10 – Evaluate the whole process and results: We may arrive to this step many times when building
complex avatars as a checking point of an iterative process that runs from step 1 to here.
Step 11 – Intelligent Report for Humans; First raw avatar synthesis: This is a human task. It resembles an
essay or a book editing with its corresponding Prologue, Epilogue, Introduction, Abstract, Index,
Bibliography and very important: its metadata structured as [avatar definition, Authorities and their profiles,
semantic and Web references, images, videos and audio, selected quotations, tags and memes].
Step 12 – Avatar buildup: it is a continuous task because avatars evolve and at the same time we evolve in a
sort of exponential e-learning process. Along circa 8,000 years the human being built something like an
avatars world library somehow ruled by a world plutocracy in a rare pairing between the wisdom of an
insignificant minority and the disproportionate power hold by another insignificant minority. At large the
Established Knowledge, the best truths, were those issued by geniuses, illuminated and powerful people and
entities. However the best truths should take into account the in mind images of information, knowledge,
opinions and why not the wisdom of we ALL humans as a collective of unique individualities. Now after more
than 80 centuries it is perfectly possible!
Epilogue
Darwin Ontology defines the life and interactions of Web Cyber creatures as a dual interacting scenario
depicting: The “K Side” or World of the Established Knowledge versus the “K’ Side” or World of the People.
Daily life avatars are usually hosted in K’ Side meanwhile formal long lasting avatars are generally hosted in K
Side.
Avatars popularities: as per Google we may distinguish avatars (pointing to 419,000,000 References) as a
single word concept and as the core of the expression "the avatars of life", (as a closed search within
quotation marks pointing to 54,000 References). Curiously the Spanish expression “Los avatares de la vida”
collects 233,000 References perhaps because within Spanish literature the term is misleadingly used as
synonym of circumstances.
Some definitions for avatar: “learning a second language” avatar draft written by students of the USC, US.
o The incarnation of a Hindu deity, especially Vishnu, in human or animal form.
o An embodiment or manifestation, as of a quality or concept:
o An icon, graphic, or other image by which a person represents himself or herself
o A digital construct (often an image file) that represents the online user in a virtual world.
We invite you to see a pre avatar buildup around the subject “How the world see us” restricted to Spanish
students of a second language, probably English, in an American university. As you may easily appreciate it is
incomplete and rather biased: authors “take side” openly and too frequently.
What’s life? What we present as avatars sounds a little disrupting a strange combination of knowledge and
intuition because these “virtual” creatures – avatars – could be all and nothing and for some cosmologist like
Hawking more alive and transcendental than humans. We invite you to imagine what’s real about them in
physical terms, matter, namely space and mass within the whole known universe: almost nothing, under all
cosmic scales close to absolute zeroes.
Note 01: Let’s try to imagine all forms of life distributed and diluted, in the average (for example in a ratio 1:1000) over the layers of the
biosphere, from the superior atmosphere and going deep a few hundred meters below surface, as compared to the Earth radio of
6,378,000 meters. It gives us almost zero mass respect to our planet, probably the only one with suspected life within our galaxy!
In Hawking words: …..”This has meant that we have entered a new phase of evolution. At first, evolution
proceeded by natural selection, from random mutations. This Darwinian phase, lasted about three and a half
billion years, and produced us, beings who developed language, to exchange information. But in the last ten
thousand years or so, we have been in what might be called, an external transmission phase. In this, the
internal record of information, handed down to succeeding generations in DNA, has not changed
significantly. But the external record, in books, and other long lasting forms of storage, has grown
enormously. Some people would use the term, evolution, only for the internally transmitted genetic material,
and would object to it being applied to information handed down externally. But I think that is too narrow a
view. We are more than just our genes. We may be no stronger, or inherently more intelligent, than our cave
man ancestors. But what distinguishes us from them, is the knowledge that we have accumulated over the
last ten thousand years, and particularly, over the last three hundred. I think it is legitimate to take a broader
view, and include externally transmitted information, as well as DNA, in the evolution of the human race”……
Bibliography
1. Life in the universe, by Stephen Hawking suggest the following evolution scheme: Energy => elementary particles => pre RNA
“accidents” => RNA => DNA => seeds of life => language => written language => “External” Evolution;
2. The Anthropic Principle, from Wikipedia: ….”The anthropic principle (from Greek anthropos, meaning "human") is the philosophical
consideration that observations of the universe must be compatible with the conscious and sapient life that observes it. Some
proponents of the anthropic principle reason that it explains why the universe has the age and the fundamental physical constants
necessary to accommodate conscious life. As a result, they believe it is unremarkable that the universe's fundamental constants happen
to fall within the narrow range thought to be compatible with life”……
3.CHON and CHNOPS: CHON is a mnemonic acronym for the four most common elements in living organisms: carbon, hydrogen,
oxygen, and nitrogen. The acronym CHNOPS, which stands for carbon, hydrogen, nitrogen, oxygen, phosphorus, sulfur, represents the
six most important chemical elements whose covalent combinations make up most biological molecules on Earth. Sulfur is used in the
amino acids cysteine and methionine. Phosphorus is an essential element in the formation of phospholipids, a class of lipids that are a
major component of all cell membranes, as they can form lipid bilayers, which keep ions, proteins, and other molecules where they are
needed for cell function, and prevent them from diffusing into areas where they should not be. Phosphate groups are also an essential
component of the backbone of nucleic acids and are required to form ATP – the main molecule used as energy powering the cell in all
living creatures. Carbonaceous asteroids are rich in CHON elements. These asteroids are the most common type, and frequently collide
with Earth as meteorites. Such collisions were especially common early in Earth's history, and these impacts may have been crucial in
the formation of the planet's oceans. => Back
Darwin Methodology - To “see” the Web more and better
Building an avatar seed
As seen by a Zen master - AI builder
By Juan Chamero, from Spain as of February 27th
2016
Galileo Galilei looking the sky, Wikipedia and many others sources
“To see more and better” avatar
Darwin Ontology states that the Web is a big Cyber Ocean that hosts Cyber Creatures named
“avatars” that register the Avatars of the Human Being, that is to say our vicissitudes, what
happen to us, what we think about everything and anything. The registration units are Home
Pages so within each of them could be avatars and/or pieces of avatars. This vision enable us to
see the whole Web like a dual scenario where we humans live continuously “emitting” messages –
aware or unaware of it - that are continuously registered by a sort of multimedia Cyber Ocean.
This document intents to describe how to devise an avatar seed about Darwin Methodology based
on a sample of well known quotations and inspirations related to human visions of the ALL
namely:
Galileo Galilei, Claude Shannon, Alan Turing, Roger Penrose, Albert Einstein, Stephen Hawking,
Plato, René Descartes, John Von Neumann, Nikola Tesla, Jaime Balmes, the Tao Te Ching book, Zen
writings about Bodhidharma, the Bible: Genesis, Apocalypses book and why not something about
Pope Francis, Saint Augustine d’Hippo and from Teilhard de Chardin, an atypical intellectual
cocktail isn’t?
Note: Umberto Eco was recently added to our inspirers list as a posthumous homage. We recommend to
read its book “How to write a Doctoral Thesis” similar to our Darwin avatar unveiling process performed
manually.
Galileo Galilei
Eppur si muove!, and yet it moves!
In 1633 after being forced to recant his claims that the Earth moves around the Sun
Galileo Galilei works inspired Darwin Ontology “to see more and better the Web” by inventing the
telescope “to see more and better the Sky”.
Claude Shannon
I just wondered how things were put together
Information is the resolution of uncertainty
Two Claude Shannon quotes from brainyquote.com. We human are in debt after its apparently
simple, astonishing and disrupting Theory of Information. We wrongly claim that we are in the Era
of Knowledge however we have still to make our homework to go a little ahead of Shannon within
the Information Era. Darwin makes its own homework along that line.
Alan Turing
Science is a differential equation; Religion is a boundary condition
Alan Turing could be considered the father of the Computing Science “avatar” in full as of today, a
real genius well endowed in almost everything and also pioneer of the thinking machines utopia.
He suggested that machines may think a crucial and long lasting controversial subject: A computer
would deserve to be called intelligent if it could deceive a human into believing that it was
human.
Roger Penrose
There are two other words I do not understand — awareness and intelligence.
Roger Penrose argues that the present computer is unable to have intelligence because it is an
algorithmically deterministic system against the viewpoint that the rational processes of the mind
are completely algorithmic and can thus be duplicated by a sufficiently complex computer. See
controversial with Marvin Minsky, that say exactly the opposite: that humans are, in fact,
machines, whose functioning, although complex, is fully explainable by current physics, See also
GoogleTechTalks.
Albert Einstein
Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning.
Albert Einstein: what to meaningfully add to our avatar about science, knowledge, wisdom and
consciousness? We only dare to select some of its quotes:
o It has become appallingly obvious that our technology has exceeded our humanity.
o The true sign of intelligence is not knowledge but imagination.
o Logic will get you from A to B. Imagination will take you everywhere.
o Science without religion is lame, religion without science is blind.
Coexistence of dualities: Wave–particle duality is the fact that every elementary particle or
quantic entity exhibits the properties of not only particles, but also waves. It addresses the
inability of the classical concepts "particle" or "wave" to fully describe the behavior of quantum-
scale objects: As Einstein wrote: "It seems as though we must use sometimes the one theory and
sometimes the other, while at times we may use either. We are faced with a new kind of difficulty.
We have two contradictory pictures of reality; separately neither of them fully explains the
phenomena of light, but together they do".
Stephen Hawking
We are all now connected by the Internet, like neurons in a giant brain.
The Web Ocean hosting all Human avatars: In Hawking words: …..”This has meant that we have entered a
new phase of evolution. At first, evolution proceeded by natural selection, from random mutations. This
Darwinian phase, lasted about three and a half billion years, and produced us, beings who developed
language, to exchange information. But in the last ten thousand years or so, we have been in what might be
called, an external transmission phase. In this, the internal record of information, handed down to
succeeding generations in DNA, has not changed significantly. But the external record, in books, and other
long lasting forms of storage, has grown enormously. Some people would use the term, evolution, only for
the internally transmitted genetic material, and would object to it being applied to information handed down
externally. But I think that is too narrow a view. We are more than just our genes. We may be no stronger, or
inherently more intelligent, than our cave man ancestors. But what distinguishes us from them, is the
knowledge that we have accumulated over the last ten thousand years, and particularly, over the last three
hundred. I think it is legitimate to take a broader view, and include externally transmitted information, as
well as DNA, in the evolution of the human race”……
Plato
Wise men speak because they have something to say; Fools because they have to say something
This quote from Plato is a brief and antique example of semantic subtleness: two extreme “in mind
images” (wise – fool) expressed in a given language (English in this case) as a misleading similarity.
The theory of Forms (or theory of Ideas) typically refers to the belief that the material world as it
seems to us is not the real world, but only an "image" or "copy" of the real world. In some of
Plato's dialogues, this is expressed by Socrates, who spoke of forms in formulating a solution to
the problem of universals. The forms, according to Socrates, are archetypes or abstract
representations of the many types of things, and properties we feel and see around us, that can
only be perceived by reason (Greek: λογική).
René Descartes
Cogito ergo sum; Je pense, donc je suis; I think, therefore I am; Pienso luego existo
Descartes may be considered the father of the modern western philosophy and for many also of
the 17th-century continental rationalism, later advocated by Baruch Spinoza and Gottfried Leibniz.
See its Discourse on the Method and its four rules:
 "The first was never to accept anything for true which I did not clearly know to be such; that is to say, carefully to avoid
precipitancy and prejudice, and to comprise nothing more in my judgment than what was presented to my mind so clearly
and distinctly as to exclude all ground of doubt.
 The second, to divide each of the difficulties under examination into as many parts as possible, and as might be necessary
for its adequate solution.
 The third, to conduct my thoughts in such order that, by commencing with objects the simplest and easiest to know, I might
ascend by little and little, and, as it were, step by step, to the knowledge of the more complex; assigning in thought a certain
order even to those objects which in their own nature do not stand in a relation of antecedence and sequence.
 And the last, in every case to make enumerations so complete, and reviews so general that I might be assured that nothing
was omitted."
John von Neumann
With four parameters I can fit an elephant, and with five I can make him wiggle his trunk
There probably is a God. Many things are easier to explain if there is than if there isn't.
John von Neumann was the missing piece of the Cyber Era: a genius and a “doer” of the
everything! The above quotes speak by themselves.
About the hidden sides of many scientific milestones: John von Neumann, for many the father of
Modern Computing suggesting to Claude Shannon a name for his new uncertainty function: You
should call it entropy, for two reasons. In the first place your uncertainty function has been used
in statistical mechanics under that name, so it already has a name. In the second place, and more
important, no one really knows what entropy really is, so in a debate you will always have the
advantage.
Nikola Tesla
Every living being is an engine geared to the wheelwork of the universe. Though seemingly affected only
by its immediate surroundings, the sphere of external influence extends to infinite distance.
Nikolas Tesla, perhaps the best modern avatar of the “inventor” and of the inventive was a Serbian
American electrical engineer, mechanical engineer, physicist, and futurist best known for his
contributions to the design of the modern alternating current electricity supply system. See some
quotes from its autobiography:
 Instinct is something which transcends knowledge. We have, undoubtedly, certain finer fibers that enable us to perceive
truths when logical deduction, or any other willful effort of the brain, is futile.
 do not think there is any thrill that can go through the human heart like that felt by the inventor as he sees some creation of
the brain unfolding to success... such emotions make a man forget food, sleep, friends, love, everything.
 It seems that I have always been ahead of my time. I had to wait nineteen years before Niagara was harnessed by my
system, fifteen years before the basic inventions for wireless which I gave to the world in 1893 were applied universally.
Jaime Balmes
Entendemos más por intuición que por discurso: la intuición clara y viva es el carácter del genio
Father Jaime Balmes y Urpiá (Catalan: Jaume Llucià Antoni Balmes i Urpià; 28 August 1810 – 9 July 1848) was
a Spanish Catholic priest known for his political and philosophical writing. In some extents he could be
considered a “Common Sense Philosopher”.
 La lectura es como el alimento; el provecho no está en proporción de lo que se come, sino de los que se digiere.
 Me convencí de que dudar de todo es carecer de lo más preciso de la razón humana, que es el sentido común.
 Terrible es el error cuando usurpa el nombre de la ciencia.
Balmes distinguishes between the concept of truth and the concept of certainty. Truth is the
expression of the agreement of the ideal order with the thing. Certainty is the mental acceptance
of the truth. There are two kinds of certainty: general human certainty (acquired spontaneously
and instinctively), and philosophical certainty (the fruit of intellectual reflection).
Bodhidharma (Zen)
As long as you look for a Buddha somewhere else,
you'll never see that your own mind is the Buddha.
 If you use your mind to look for a Buddha, you won't see the Buddha.
 The mind is the root from which all things grow if you can understand the mind, everything else is included.
Zen, quantum mechanics, Yin – Yang, Tao Te Ching, mind, awareness, consciousness, ontologies,
Tai Chi, Kung Fu and more….: Bodhidharma, the Zen creator, was a Buddhist monk who lived
during the 5th or 6th century. He is traditionally credited as the transmitter of Chan (Zen)
Buddhism to China, and regarded as its first Chinese patriarch. According to Chinese legend, he
also began the physical training of the monks of Shaolin Monastery that led to the creation of
Shaolin Kung Fu. Darwin Ontology has something of Zen that states that the underlying base of
reality is change, process and impermanence relatively in slow motion and that the observer is
part of the system……The strange interactions of fundamental particles with the mind of the
observer ('quantum weirdness') have long been of interest to philosophers. There are two
opposing views: (i) Quantum weirdness produces the mind, versus (ii) The mind produces
quantum weirdness. See log about Buddhism, Quantum Physics and Mind.
Genesis – The Tower of Babel
The word is the Verb and the verb is God (Victor Hugo)
A Semantic Enigma: Is it an enigma or a warning light? Why so many and so different languages?
Is really the Verb the creator of the Everything? The Tower of Babel (/ˈbæbəl/ or /ˈbeɪbəl/;
Hebrew: ‫ל‬ ַּ‫ד‬ְ‫ג‬ ִ‫מ‬ ‫ל‬ ֶ‫ב‬ ָּ‫ב‬, Migdal Bāḇēl) is an etiological myth in the Book of Genesis of the Tanakh (also
referred to as the Hebrew Bible or the Old Testament) meant to explain the origin of different
languages. According to the story, a united humanity of the generations following the Great Flood,
speaking a single language and migrating from the east, came to the land of Shinar (Hebrew:
‫שנער‬). There they agreed to build a city and tower; seeing this, God confounded their speech so
that they could no longer understand each other and scattered them around the world.
Apocalypses (Revelation)
We are just an advanced breed of monkeys on a minor planet of a very average star. But we can
understand the Universe. That makes us something very special (Stephen Hawking)
Revelation 19:11-21 And I saw heaven opened, and behold, a white horse, and He who sat on it is
called Faithful and True, and in righteousness He judges and wages war. His eyes are a flame of
fire, and on His head are many diadems; and He has a name written on Him which no one knows
except Himself. He is clothed with a robe dipped in blood, and His name is called The Word of God.
Pope Francis
Oh, how I would like a poor Church, and for the poor.
A leading exponent of the word sacralization and vulgarization at the same time
 Oh, how I would like a poor Church, and for the poor.
 We must restore hope to young people, help the old, be open to the future, and spread love. Be poor among the poor.
We need to include the excluded and preach peace.
 I am always wary of decisions made hastily. I am always wary of the first decision, that is, the first thing that comes to my
mind if I have to make a decision. This is usually the wrong thing. I have to wait and assess, looking deep into myself,
taking the necessary time.
 Sometimes negative news does come out, but it is often exaggerated and manipulated to spread scandal. Journalists
sometimes risk becoming ill from coprophilia and thus fomenting coprophagia: which is a sin that taints all men and
women, that is, the tendency to focus on the negative rather than the positive aspects.
“The internet …,” writes Pope Francis today, “offers immense possibilities for encounter and
solidarity. This is something truly good, a gift from God.” See Communication at the Service of an
Authentic Culture of Encounter": Pope's Message for World Communications Day.
Saint Augustine
The world is a book, and those who do not travel read only a page
Men go abroad to wonder at the heights of mountains, at the huge waves of the sea, at the long courses of the rivers, at the vast
compass of the ocean, at the circular motions of the stars, and they pass by themselves without wondering.
Augustine of Hippo (/ɔːˈɡʌstᵻn/ or /ˈɔːɡəstɪn/; Latin: Aurelius Augustinus Hipponensis; 13
November 354 – 28 August 430), also known as Saint Augustine, Saint Austin, or Blessed
Augustine, was an early Christian theologian and philosopher whose writings influenced the
development of Western Christianity and Western philosophy.
Sayings about Quantum Physics, time and Saint Augustine: In the Confessions of St. Augustine,
Book IX, Chapter X (chapter 9, section 10) there is a philosophical analysis of time. Though
Bertrand Russell was an atheist and says that he has a different philosophy of time than Augustine,
in his History of Philosophy, Russell nevertheless less says that Augustine's philosophy of time is
deeply profound. Among other conclusions, Augustine states that both the past and the future
exist simultaneously, and yet only the now exists. And that in God there is no time.
Teilhard de Chardin
The universe as we know it is a joint product of the observer and the observed.
Relevant concepts concerning Darwin Ontology: noosphere, prolegomena about The All, and
Omega Point. Pierre Teilhard de Chardin SJ (French: [pjɛʁ tejaʁ də ʃaʁdɛ̃]; 1 May 1881 – 10 April
1955) was a French philosopher and Jesuit priest who trained as a paleontologist and geologist
and took part in the discovery of Peking Man. He conceived the idea of the Omega Point (a
maximum level of complexity and consciousness towards which he believed the universe was
evolving) and developed Vladimir Vernadsky's concept of noosphere.
Darwin Ontology (I)
In any piece of the ALL you may see the ALL
This e-book depicts a long Semantic Web scouting along 11 years from two points of view: from
Digital to Mind along W3C standards and from Mind to Digital for many the "Common Sense
Way". The outcome of this journey is Darwin, a semantic ontology to "see the Web as semantically
structured" through a sort of "Semantic Eyeglasses". These virtual eyeglasses like the Galileo
Galilei telescope enable us to Map the whole Web as_is and to build Semantic Super Search
Engines that work at mode YGWYN in only one query.
Darwin Ontology (II)
Everything connected with everything
The Web is a layer on top of Internet that for many belongs to the people. In my humble opinion
this was not planned, but an accident, the consequence of the appearance of a revolutionary
technology as it happens along the evolution. Before Internet arrival communications media,
newspapers, Radio and TV worked unidirectional, from a de facto “Established Order” side to the
“People’s” side “broadcasting” programmed pieces of information and knowledge, from sellers to
buyers, from rulers to ruled, from teachers to students, from truth holders to truth seekers. The
Peoples’ side is explored via Darwin, an AI Ontology that enable us to see the Web more and
better focusing in Social Networks and the Deep Web, for many the hidden Web. As a demo a
Darwin agent makes over Established side a “tomography” for the theme art history, from
Altamira Caves to Nanoart.
Umberto Eco
Umberto Eco quotes by relatably.com
The last but not the least! I included Umberto Eco – the genial Italian author and semiologist
recently deceased - as representative of one out of many Darwin Ontology hidden influencers.
Read “come si fa una tesi di laurea”, how to write a doctoral thesis.
o But now I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by
our own mad attempt to interpret it as though it had an underlying truth.
o Captain Cook discovered Australia looking for the Terra Incognita. Christopher Columbus thought he was
finding India but discovered America. History is full of events that happened because of an imaginary tale.
Conclusions
We need a semantic ontology to see the Web more and better: what are the four crucial questions
we must ask ourselves?
 What do we have at hand?
o Data, Information, Knowledge, Wisdom
o Intelligence
 What should we have to unveil?
o The ALL
o The Everything Connectedness
o The Man Machine utopias
 What should we have to consider in the ontology?
o To unveil all type of Avatars;
o To unveil Semantic Logical Trees structures;
o To unveil all type of Directed Graphs;
o To unveil and manage K versus K’ namely: Websites versus Users dialogues;
o To unveil K Thesaurus, namely: formal Knowledge Thesaurus;
o To unveil K’ Thesaurus, namely: Users Knowledge Thesauruses;
o To take into account and continuously check that HK is bounded;
o To take into account and continuously check that HK’ is bounded;
o To take into account and continuously check the semantic weight of WWDs, Well Written Documents;
o To take into account and continuously check the Semantic Resonance of accomplishment unveiled concepts;
o To take into account and continuously check the In mind images uniqueness for all languages;
o To take into account and continuously check the Concepts Uniqueness for all languages;
o To take into account and continuously check the Concepts Specificity accomplishment and uniqueness for all languages;
o To unveil the best sets of Authorities for any subject;
o To check continuously that the ontology is fully accomplished by the Semantic seeds;
o To check continuously that Semantic fingerprints are appropriately computed for any subject;
o To check continuously that Semantic metadata are appropriately computed for any subject;
o To check continuously all type of intrusions warning and estimating their pollution effects;
World Authorities Influence Logic Matrix
Once defined – as a fact - our major Darwin Authorities Influencers we proceed to briefly depict
either the semiotic – semantic aspect or facet influence of each of them over our ontology. This
matrix behaves as a philosophical and scientific support about a complex problem: How to see the
Web more and better.
For instance one of the Darwin Conjectures states that the Web space is structured as a dual
system and continuous interacting worlds: K Side or Established Knowledge (Websites owners,
authors and administrators) versus K’ Side or People Side (we, humans as Internet users): Albert
Einstein, Stephen Hawking, and Bodhidharma are notorious influencers.
Another example concerns what we defined as “Semantic Resonance”: Our Darwin Ontology also
states that for a given pair “language – culture” we may unveil millions of “in mind images”. These
images are recognized by their “names”. It also states that they could be unveiled by the
phenomenon of “Semantic Resonance”: any in mind image could be retrieved via Conventional
Search Engines, by many different “names”, however only one of these names will be the semantic
winner in terms of quantity and quality of references.
 Galileo: the meaning of “To see more and better” and tools to perform it;
 Shannon: The actual Web is not yet semantic; intuitive ideas trying to understand
“knowledge” meaning by enriching and extending the meaning of ”information”;
 Turing: limits of Web unveiling by huge and exhaustive procedures – Big Data -;
 Penrose: Human Intelligence is more than algorithmically deterministic system;
 Einstein: coexistence of dualities; intelligence is much closer to imagination that to
knowledge;
 Hawking: cyber avatars; coexistence of dualities; externally transmitted information as
important as internally transmitted via DNA and genes to evolve;
 Plato: in mind image, forms and avatars; the essence of concepts as semantically specific
universals;
 Descartes: The Discourse of the Method in its full and ample sense;
 Von Neumann: entropy, negentropy, and Big Data within ontology concerns;
 Tesla: inventive avatars axes; ideas about Darwin Intelligence Reports buildup;
 Balmes: Logic versus Common Sense concerning trees, arboreal logic, semantic
resonance…);
 Bodhidharma: quantum physics and mind; coexistence of dualities; avatars; awareness;
 Genesis: The Power of Semantics; The Power of Words;
 Apocalypses: The ALL, The EVERYTHING and the END; black holes;
 The Pope: as a typical Darwin avatar archetype:
 Saint Augustine: only the NOW exists; time inexistence utopias;
 Teilhard: The Noosphere;
 Darwin: The Semantic Ocean  Semantic Web  Noosphere;
 Eco: ideas about how to build trustable IR’s;
=> Back
Darwin Ontology - Conjectures
By Juan Chamero, from Spain at March 30th
2016
Conjectures Subjects Overview
Darwin Ontology enables humans to “see more and better” the Web throughout the ontological
and computational guide of its Conjectures. This vision involves seeing the Web as totally indexed
by meanings, approaching as much as possible to the Semantic Web utopia and the detection and
retrieval of all type of data, information and knowledge disperse on it. Conjectures subjects follow:
1 (a) A world of human “in mind images”;
2 (b) A world of “words”;
3 (c) A quantifiable and bounded World;
4 (d) A world of probabilistic nature;
5 (e) A nominative world: all its creatures have a name;
6 (f) A world of semantic vibrations;
7 (g) A world of avatars;
8 (h) Avatars library;
9 (i) Retrieval of hidden intelligence;
10 (j) Knowledge DIKW;
11 (k) A world of Arboreal Structures;
12 (l) Avatars unveiling and IR’s, IdeI’s;
13 (m) K versus K’ worlds;
14 (n) K versus K’ Semantic interchange;
15 (o) Two types of Semantic Particles;
16 (p) e-membranes and unveiling without perturbing;
17 (q) Disciplines of Knowledge, WWD’s and WFF’s;
18 (r) Subjects and Concepts;
19 (s) Knowledge Authorities;
20 (t) Human Knowledge, Thesauruses and Derived Concepts;
21(u) Thesauruses for K and K’;
22(v) Semantic Web;
Avatars Conjectures: Internet and particularly the Web enable humans to keep a huge, open and in extreme detail virtual log book of
our lives, of our occurrences, activities and even of our thoughts and “in mind” processes along time as well. The entries of this log
could be assimilated to “avatars” in their different acceptations namely from graphic representations of all type of things and entities
including personalities and investitures to incarnation of deities or facets of them and of ideal creatures. Avatars could also be imagined
like “meaningful in mind images” that need to be “explained” to be understood. Conjectures in emerald deal with avatars.
Semantic Web Conjectures: We may also see the Web as a huge multimedia reservoir structured as a dual and continuous interacting
world: one we name as “K side” assigned to formal creatures registering the “Established Knowledge” at a given moment and the other
we name as K’ side assigned to the people as users and at the same time proprietors of the “Knowledge in Formation”. Conjectures in
blue deal with Semantic Web.
Hinge Conjectures: In order to operate such a huge and complex semantic system we need of a few “hinge” Conjectures in grey
connecting the old Darwin Conjectures (in blue) to the new last 10 conjectures (in emerald) that enable us to “see” the Web as an
Ocean of Avatars, a more advanced vision than the Semantic Web.
The First 12 Conjectures Synopsis
The Web could be seen as a world of human in mind images; humans agree about their meaning
thru specific and appropriate use of words; this universe could be quantified; however
probabilistically; it also could be imagined like a huge Cyber Ocean where these nominative
creatures, bearing personal names, live; It also could be imagined like a huge Ocean where these
nominative creatures behave like wavelets enabling their detection and recognition by semantic
resonance via search engines; in fact a world of avatars as virtual creatures registering “literarily”
the avatars of our past, present and probable futures lives; these avatars are documented and
hosted like in conventional libraries but by pieces of information and knowledge disperse by
billions here and there; the patterns of this dispersion suggest the existence of a hidden
intelligence that could be unveiled; up to here we have presented a set of 10 Conjectures necessary
to operate with avatars. However conventional informatics works by de facto under a sort of Cyber
pre agreement: the DIKW Pyramid; from here we may state that formal knowledge tend to
structure by itself as a wood of “Semantic Trees” and that knowledge in formation tend to
structure also by itself as more primitive and disordered arboreal forms; knowledge in formation,
informal forms of knowledge and complex forms of information are the basic components of
Intelligence Reports managed as avatars;
Darwin Ontology Conjectures
a) A world of human “in mind images”: Human beings transmit their cognitive legacy thru
“in mind images” as “concepts” only “seen” thru our minds;
b) A world of “words”: In mind images identify specific pairs “language- culture” for instance
“American - English” and “International – Spanish” meaning that they can be explained
and understood thru the pair language – culture of their belonging;
c) World sizes: The total number of these in mind images, as Web space creatures, is
estimated at present from 12 to 20 million per pair language – culture;
d) A world of probabilistic nature: For each pair language – culture in mind images are
“unique” however with a “unicity” spectrum of probabilistic nature that is to say that all
them as well as their corresponding explanations may differ slightly from person to person
and even from situation to situation and from moment to moment;
e) The names of its creatures: For each pair language – culture in mind images are identified
by their “unique” names expressed as precise chains of words namely: “running”,
“meditate”, “son of a single mother”. “EU young people unemployment rate”, “Pope
Francis”, “Barack Obama”, etc.;
f) A world of semantic vibrations: Names are unique in probabilistic terms, “probabilistically
talking” for a determined place and time, for example the Web as_is at a given moment
associated to a sort of “Semantic Resonance”;
g) A world of avatars: These in mind images we nominate as “avatars” could be mentally
seen, perceived and/or represented thru text, visual images, sounds, and multimedia of
any type are Cyberspace “virtual creatures” defining our civilization at ANYTIME and
ANYWHERE;
h) Avatars library: Our civilization along time, have properly agreed and recorded besides
those agreements throughout books, essays, comments and very recently in semantic
documents as Web pages;
i) Retrieval of hidden intelligence: Avatars are either seen or looked like structured
following similar patterns to our way of thinking for example more and less important
more and less complex however always hierarchically and by affinity interrelated as
pertaining to a unique “The All”;
j) Knowledge DIKW: Hypothetical Conjecture: DIKW Pyramid. This parallel life paradox of
we, humans, and of our avatars looked like embedded within a common sense model
agreed and evolving along time represented by the hierarchical pyramid Data =>
Information => Knowledge => Wisdom refined and structured thru a growing intelligence;
k) Arboreal structures: Knowledge defined as the hierarchically triad Facts (Data) =>
Information => Skills and Talents, and as our evolutionary guide as well would structure
by itself as arboreal forms. Ideally as a wood of “Semantic Trees” of unique roots and
thematic ancestry;
l) Avatars unveiling and IR’s, IdeI’s: Avatars are “seen” by our minds with a diversity of
forms and ways proportional to their complexity and to the cultural differences of the
observers, humans, groups, and/or collectives (see Conjecture d)). This feature enables us
to unveil objectively and non-vitiated IR’s, “Intelligence Reports”, semantically depicting
as many facets as existent in the avatar at a given moment, something like for instance
Vision 1 of the “ism - 1”, Vision 2 of the “ism - 2”,…., Vision n of the “ism - n”, etc.;
m) K versus K’ worlds: General man-machine interaction could be imagined as a continuous
dialog and dynamic equilibrium between two sides: the Established Knowledge - Realm K
versus the People’s Knowledge – Realm K’;
n) K versus K’ Semantic interchange: Through the subtle interface between K and K’,
relatively to each side inflow and outflow only two kinds of semantic particles:
“Established Concepts” (from K to K’) and “People’s Concepts” (from K’ to K). These
particles are “separated” by communications/instances, operators from K and K’
respectively necessaries to make dialog meaningful;
o) Two types of Semantic Particles: Documents and messages, the elementary objects of
Realms K and K’ are only constituted by two kinds of semantic particles: “Common Words
and Expressions” and “Concepts”;
p) e-membranes and unveiling without perturbing: This digital dialog may also be imagined
like performed trough “e-membranes”, resembling bio membranes with endoderm,
mesoderm and ectoderm where inflow and outflow traffic of semantic particles and
instances could be “seen” without perturbing K and K` Realms actors. Darwin took its
name of this Conjecture: Distributed Agents to Retrieve the Web INtelligence as a Darwin
network of e-membranes;
q) Disciplines of Knowledge, WWD’s and WFF’s: Documents in K side tend to discriminate in
“disciplines” of the Established Human Knowledge. For each discipline there exist a
minority of documents that fit “as much as possible” to their “trees” being at the same
time literary and conceptual “Well Written” and a majority of document that doesn’t. The
first ones are considered “authorities”. WWD, Well Written Documents resemble WFF’s,
Well Formed Formulae of Formal Logic;
r) Subjects and Concepts: Subjects are those specific concepts associated to the nodes of
their respective discipline trees as the “semantic paths” that arrive to them from their
roots. Concepts “should be” the same for all pairs language - culture. For each node there
exists one and only one subject. Being the subjects known appear for each of them new
and somehow derived concepts that “belonging” with a strong specificity to it could be
defined as its “Associated Concepts”, namely those ones that “at large” define and precise
their respective themes;
s) Knowledge Authorities: For each subject there exist at a given moment with a high level
of probability, within universal and huge reservoirs like the Web, a “Set of Authorities”
dealing with it with a well defined authoritativeness;
t) Human Knowledge, Thesauruses and Derived Concepts: From “Sets of Authorities” we
may develop a sort of industrial process to extract their “Associated Concepts” sets
establishing then the following correspondence: for each subject we may find its
representative authorities set and from it we may build its Associated Concepts set. All
discipline trees of the “Human Knowledge” that within their nodes have their respective
authorities’ sets and their respective Associated Concepts sets constitute the “Web
Thesaurus”;
u) Thesauruses for K and K’: A similar Thesaurus could be defined and unveiled in the K’
Realm as the “People’s Thesaurus”. Similarly to Subjects K, Authorities K, Associated
Concepts K could be defined Subjects K’, Authorities K’ and Associated Concepts K’;
v) Semantic Web: Once K and K’ sides are known as_they_are, unveiled from retrievable
Web documents and messages, actors on each side are enabled to know as much as
possible of the other side. This event will accelerate the human learning process. K and K’
could be considered as fully mapped and this mapping may be continuous and perfected
along time.
=> Back
DARWIN in a nutshell
Darwin Methodology Briefing
By Juan Chamero, Principal Architect, from Barcelona as of 2015-01-13
Upon INTAG proprietary document: darwin_brief_PDF.rar
Darwin Brief (darwin_brief.pdf): a brief Darwin index to accede to:
 WHAT is Darwin (darwin_methodology.pdf): it is a methodology to “see more and better”
the Web and comparable data reservoirs thru 11 applications;
 HOW to “see” the Web from Darwin (Darwin_Web_EN.pdf): for experts;
 BIBLIOGRAPHY (Darwin_bibliography.pdf): about Darwin and its creators;
Conclusions: ¿Are we in the dawn of a new way of thinking?
And in its turn Darwin Methodology (darwin_methodology.pdf) opens in:
 A CARROUSEL (darwin_carrousel.pdf): imagery about a possible way to explore Darwin
meaning as a manifestation of a new way of thinking and a new vision of existent world
under the new technologies;
 A KNOWHOW sample (darwin_basicbuildup.pdf): something about HOW Darwin “see”
meaningful connected what appear in the Web disperse here and there and not
structured;
 Even though not conscientious of it we humans are by de facto and of a sudden
submerged in BIG DATA (darwin_BigData.pdf) scenarios that each time leave us less time
to think: ancestrally we passed from a “many” of hundreds and thousands of events and
instances to trillions and more and shrinking our meditations times from months, days and
hours to fractions of a second as well. Darwin moves with suitability in these scenarios.
 Complementary a brief explanation of the 11 Darwin Applications and of their respective
DARWIN ICONS (darwin_icons.pdf). Icons and avatars are a common place of our cyber
culture that has incorporated besides text and image the audible, the visible and shortly
the tactile.
=> Back
Darwin Brief
Juan Chamero, from Buenos Aires, Argentina as of January 1st 2015
Index
 Darwin Methodology (2 pages)
o Darwin Carousel (5 pages)
o Darwin Maps Buildup (3 pages)
o Darwin Big Data (3 pages)
o Darwin Icons Meanings (5)
 The Web as seen by Darwin Methodology (3 pages)
 Darwin Bibliography (1 page)
Recommended lecture order: if you are well acquainted with Semantics you may start reading
The Web as seen by Darwin Methodology. If you are not the order suggested is to read Darwin
Methodology, an introductory document that explains how to see the Web as semantic. This
document is complemented by three appendixes: Darwin Carousel, a document that tries to
explain that perhaps we are in the dawn of a new way of thinking; Darwin Maps Buildup, a
document that explains the basic knowhow to unveil non structured information and knowledge
out from the Web; and Darwin Big Data, a document explaining how Big Data have challenged us
and perhaps is somehow changing our way of thinking computationally.
We also recommend reading Darwin Icons Meanings document that briefly describes each one of
its actual 11 applications and finally Darwin Bibliography, an index of links.
=> Back
Darwin Methodology
Distributed Agents to Retrieve the Web Intelligence
By Juan Chamero Principal Architect of Darwin Methodology, as of January 6th
2015
We present Darwin, a methodology to “see more and better” huge reservoirs of data like the Web as if their
contents were semantically structured. It implies detection, retrieval, ordering and synthesis of all pieces of
information and knowledge about any subject disperse here and there within the reservoirs. The synthesis
may take the form of: (see below); see Darwin icons meanings;
o Thesauruses;
o Maps of Knowledge;
o Non intrusive e-membranes to communicate among different environments;
o OOC, Only One Click Semantic Search Engines;
o Encyclopedias;
o Non intrusive massive Surveys and Polls about any subject;
o Intelligence Reports about any subject;
o Avatars, AI creatures that represent and/or emulate primary powers and trends;
o Intelligent Web Portals;
o Big Data synthesis;
o Autonomous Artificial Communities;
Are we around the end of Conventional Thinking?
Behind Web Semantic: a global cultural discontinuity?
Apology from Legal Dictionary
This is an apology and a sample of recent digital history: At the end of our long journey that lasted 15 years
trying to unveil the Web we failed to document an enough comprehensive synthesis of the work performed
and its findings. Why? As a Zen master however specialized in Artificial Intelligence I continued with my
Western habit writing books and essays via “papers”, white and classified ones, trying to explain others the
whole history as a “linear” logic sequence, from prologues to epilogues thru abstracts, antecedents, of
course the textual cores, appendixes and bibliographies. The investigated theme, Web Semantics was
perhaps too big and vague and seen at distance it seems to be highly disruptive as well.
The Web explodes registering almost everything that happen in our world openly and free. Darwin enables
us the building of tools and methodologies to see the Web more and better at extreme detail resembling a
super ultra semantic telescope. However we have to pay a toll to use those tools and methodologies: to
enter de facto into Big Data scenarios embedded “of a sudden” in a world that enforce us to behave “real
time” without enough time to “think”. Now the Zen non linear way of thinking comes in my help.
Zen exploration?: Are we going to a “To be aware of everything” way of thinking?: One of the trivial things
we “discovered” by exploring the Web as our main source of information and knowledge is that ALL is
related to ALL and EVERYBODY to EVERYBODY being those everybody physical and juridical creatures and
avatars, where everything that occurs to them has sense. It’s is like making a census by interviewing people
as persons searching for how they consider something bad or good, we like or we do not like. From the
beginning, by dialoguing, questioning and why not answering, we may easily detect factors that were
initially ignored and that in order to be honest and precise should be taken into account. We continue
thinking how to face this new challenge where nothing is really environment but interactive part of a unique
and always changing ALL, but in the interim reporting our awareness state along our Web explorations. And
reporting not only the rare things we may found but also our agents’ findings under our guidance.
The bonus of being aware example: As an example Darwin process raw cognitive units named “textons”,
huge vectors of 10,000 documents and more per theme and talking about a cognitive universe of 1,000,000
themes we should – in theory- process 10,000,000,000 documents!. This is an a priori discouraging scenario,
isn’t? However the first textons exploration presented to the “eyes” of our agents rareness signs that enable
us to find two and three order zero shortcuts. To understand Darwin better we invite you to follow its
logbook.
Darwin Logbook
Darwin Carousel – A Technology Paradox
Why do we do what we did?
Carousel from Google images
Basic building knowhow
Web Thesauruses and trustable Intelligence Reports buildup
A Big Data Challenge
How to retrieve all concepts of a given language culture
Ways to make sense of Big Data, from Phys.org
=> Back
Darwin HKM Carousel
A technology paradox
Juan Chamero, as of January 5th
2015
Encyclopedias: Paradoxically wiithin the Digital Revolution, within The Information Society and very recently
within the Social Network conventional Encyclopedias are dying! Until recently the last Encyclopedias like
The Britannica announced its last 2010 printed edition version dealing with approximately 80,000 subjects,
by the way 10% of the available knowledge disperse in the Web. The last systemic index of the Human
Knowledge before these “conventional” was the Diderot Encyclopedia (1751 – 1772) edited in French.
The Diderot Encyclopedia
Wow! Knowledge is within our minds!
And not only knowledge but all type of information and even wisdom! Being deep in our minds we only
acquaint of these substances indirectly thru registers, documents, works and gestures! So if actually the
Web hosts 40,000,000,000 documents we may say that we may “have at hand” 40,000,000,000 documented
expressions of those ideas! The first time we humans have “at hand” a global and meaningful sample of all
type of ideas from genialities to stupidities, online, and almost real time besides!
The power of gestures: if you query Google Images by “demand for explanation” it will render you hundreds
of versions of this image. Do we need some extra explanation to understand this universal baby gesture?
And we also have at hand billions of images about the Web_as_is “everything”!
Demand for Explanation
How many types of “in mind” ideas do we have? At least three types: pieces of information, pieces of
knowledge, and needs. Pieces of information are a continuous need to guide our lives; pieces of knowledge
are a crucial need to evolve positively along our lives; needs in general to live and for a living. We acquire
knowledge by studying, information by questioning; general needs are acts that become “experience”.
From dreamstime.com
In order to study [1] you need “libraries”, “Thesauruses” and “Encyclopedias”; In order to have an efficient
questioning [2] you need “semantic search engines”, and in order to optimize your experience [3] you need
to collegiate with similar people. Darwin enables the Web to be used as a study home, become semantic
conventional search engines and facilitate open and free organization of people with similar areas of
interests.
A little about some necessary “bored” things:
First HKM, Human Knowledge Map, about ICT (2002): we created the first Web Thesaurus about
Information Computing and Telecommunications. Initially we started joining the last ACM, Association of
Computing Machinery (2001) semantic index with the IFIP UNESCO Informatics standardization for the RW,
Rest of the World: an ICT Thesaurus of 2,300 subjects and 54,000 concepts. We see below its upper 5
hierarchical levels.
Are arboreal structures natural forms of our thinking? It seems that yes, they are. And going a little farther
what about our abstract thinking? It also seems that we are also used to. In the figure below at left asking
Google by “fractal tree” we get a sample image of primitive arboreal abstract trees. At right asking Google
Images by phylogenetic and then by Life Trees we may get a full sample of arboreal trees (real bio trees).
Abstract trees Bio trees
Google Homage to Ramon & Cajal Nobel Price of Medicine, (1906), neuroscientist, perhaps the father of
modern neuroscience, was considered a disastrous student with an extreme revolutionary antiauthoritarian
attitude and even for his father a “little short” of brain. One of his metaphors was: cortical pyramidal cells
may become more elaborate with time, as a tree grows and extends its branches. I believe that this
metaphor suggests us that we may go far walking with small steps but always expanding our mind
awareness by exploring the unknown. Ramon & Cajal was, like the Great Leonardo, a brilliant draftsman: he
began drawing more and more complex neural networks and learning from them at exponential pace.
The beginning should be neat and easy for all: semantic seeds. As explained above Ramón & Cajal started
his adventure “to see more and better” the brain trying to unveil and draft on paper a single neuron. We as
Darwin Project were experimenting with different types of seeds and strategies for growing. Our option for
The Art map building was the last downright schema.
Of course we do not imagined the seed as having 7,570 nodes as an “adult tree” but as a tiny skeleton of a
root initially opening in up to seven clusters as seen in the image below namely: performing arts, visual arts,
culinary arts, literature, arts history, physical arts, and arts infrastructure.
The Art Thesaurus: the figure below depicts one of the visions of The Art Thesaurus resembling a “map” to
facilitate the human comprehension and exploring the existing knowledge as is in the Web at a given (any)
moment. It could also be seen as the Upper Levels of an Art Tree index.
Another technological paradox: we have seen Ramon & Cajal paradox, the revolutionary and/or disruptive
innovations as a sort of unsought premium bonus of long scientific exploration efforts. Another one is the
Global Warming phenomenon ”unveiled” by Wallace Smith Broecker, geophysicist and climate authority as
one “byproduct” of his research along decades about the CO2 concentration in our atmosphere. And from
this discovery proliferate hundreds of derived and interrelated researches, for instance the creation of
“Artificial Trees” (see below in thebreakthrough.org).
Artificial Trees, from thebreakthrough.org
=> Back
Darwin Methodology
HKM, Human Knowledge Map basic building knowhow
Juan Chamero, as of January 5th
2015
Darwin HKM buildup could be logically imagined along a sequence of four mega steps, namely: a) MS’s data
discovery; b) HKM Logical Skeleton buildup; c) HKM unstructured data sample; d) HKM structured.
Step 1: By MS’s, Major Subjects data discovery we mean scouting the Web to retrieve all “modal names” of
the knowledge branch under study, only their names not their conceptual meaning not their relative
ordering within the Logical Tree of the knowledge branch. In our example we mean discovering the 7,570
names of The Art MS’s, Major Subjects.
Step 2: By HKM Logical Skeleton buildup we mean to unveil from the Web the semantic ordering of modal
names becoming “nodes names” of the Knowledge Tree finding the unique correspondence between a tree
node and its modal name: 7,570 unique nodes  7,570 unique modal names for our example.
Step 3: By HKM unstructured data sample we mean a huge conceptual but still unstructured data sample
of the branch of knowledge under study. This Big Data scenario is not easy to imagine: For each node we
need a meaningful sample (for instance 10,000 Web Pages) that enables us to discriminate how structured,
noisy, disperse, misleading and diluted from its semantic point of view a given MS is.
Step 4: By HKM structured we mean a synthesis of the above mentioned data sample: for each node we
should synthesize its “semantic fingerprint”, a set or sets of specific concepts that statistically and ideally
are used in the Web_as_is at a given moment and for a given pair language culture to describe the subject
inspected.
Step 3 Performing: Let’s suppose that somehow we have already successfully performed steps 1 and 2
starting step 3. Using the names list obtained in Step 1 we proceed to buildup textons, one for each name,
as for our example 7,570 textons of about 10,000 Web Pages each. Once the textons are stripped off from
code leaving only meaningful text and images we should inspect their content document per document
retrieving their potential concepts, in the average 50 per document totaling about 500,000 suspected
potential concepts per subject!
Note 01: take into account that a whole HKM have about 800,000 MS’s, Major Subjects.
Step 4 performing: we proceed now to synthesize that huge mass of 500,000 potential concepts per subject
by finding the “modal”, semantically the best, 50 in the average.
Note 02: The sample expanding to around 500,000 suspected potential concepts was designed in order to
study how structured, noisy, disperse, misleading and diluted from its semantic point of view a given MS is.
We have now our first version of a HKM, in our example for The Art: 7,570 nodes, going from “root” to
“leaves” along 13 levels and having in the average 50 specific concepts per major subject/node, rounding
about 378,550 concepts plus the skeleton 7,570 subjects that are in fact the leading concepts of the
discipline under study.
Steps 1 and 2 performing: these steps could be seen as a coupled semantic convolution of names and
hierarchies, a sort of e-learning process of rapid convergence that started with a “semantic seed” for each
branch of the HK. In an extreme we may also explore the Web without the aid of seeds starting from zero
knowledge. Along our 15 years of work, four prototypes and dozens of semantic seeds buildup for third
parties, we have thoroughly checked that any Web expert may identify the “authoritative” core for any
branch of the HK in no more than one day effort.
Specifically The Art authoritative core of about 200 authorities was identified by a human. Each of them
semantically covered more than 50% of the suspected MS’s and its 2% top covered more than 95% of the
suspected MS’s. Backed up by this departure knowledge we initiated a sort of “anthropic algorithm”: a
human expert in multi agents programming, learning as much as possible by himself and guiding and/or
adjusting agents work, a sort of a man machine cooperative gathering selecting pieces of information and
knowledge jumping from link to link within the base cluster and from it to semantic neighborhood clusters,
expanding the initial base.
How to check that something goes wrong along the exploration: This anthropic process evolves too fast
suggesting us that probably logic attracts logic tending to empower it and to weaken the illogic. We also
have a mechanism of easily checking what’s poorly structured from a semantic point of view: the ontologies.
Effectively any ontology enables us to check something that already exists and to answer questions such as:
was it created or not under ontology conjectures? Notwithstanding ontologisms aid us nothing about the
“art” of creation of things. For instance it tells nothing about how to unveil – or in an extreme to invent- a
logical skeleton of art but on the contrary it warns us if something goes wrong.
HKM buildup Schema
The figure above illustrates a core part of the intertwined process of 4 steps. From 1 to 4 we depict at left
the four big steps looping embedded. In the middle we depict a KT, Knowledge Tree of 15 nodes from root
to leaves. A HKM Human Knowledge Map resembles a logical forest of 200 trees, one for branch of
knowledge and each tree having in the average 4,000 nodes (Major Subjects) totaling a sort of World
Encyclopedia of 800,000 Major Subjects for a given pair language culture. At right we show a huge matrix
used to check whether or not semantic hierarchies ideally corresponds to logical trees: nodes 1 and 2 derive
from unique ancestor 0 (root), nodes 3, 4 and 5 derive from unique ancestor 1,……, and so on and so forth.
Now you have to imagine that “within” any node is hosted metadata and sets of concepts and images
specifically related to its subject. So the mass of concepts of a given branch of knowledge, usually more than
95%, is hosted in the nodes! The 5% left corresponds to the names of the MS’s (main subjects, main themes
and topics of the discipline under study).
At right we depict the MS’s versus itself, nodes versus nodes. If these MS’s are structured like a Logical Tree
the matrix would be practically empty. Its unique ancestry nature enforces the “x” as shown. However in the
real Web all type of abnormalities proliferate like the ones in yellow. This is one of the hardest Darwin tasks
HAZ, Hierarchy Abnormalities Zoning.
=> Back
Some Darwin Challenges
A Big Data Challenge as well
Juan Chamero. as of January 5th
2015
Challenge I: Darwin Concepts Unveiling from Textons
Darwin retrieves suspected potential concepts out of textons, documents however semantically “noisy”.
Talking of a 10,000 Web Pages textons sample we mean extracting their 10,000 corresponding semantic
profiles (Darwin “fingerprints”).
Texton: a large enough string of meaningful documents supposedly dealing with the same MS, Major
Subject, for instance “modern theatre” within “The Art” branch of knowledge. The location within the string
for instance from its beginning to its end and from left to right is directly related to its semantic significance.
Textons usually have from 1,000 to 10,000 documents (Web Pages  URL’s).
Note 01: this is a strong supposition that must be checked along the Darwin concepts unveiling process. The “raw data” is
provided by conventional search engines like Google that “rank” URL’s as per their own criteria. This primary ordering
that is taken into account by Darwin is continuously checked and enriched because it performs an exhaustive analysis
about how all textons deal with subjects and rank conceptually between them!. In common words Darwin detect that
some sources (URL’s) may provide more and better information to us than allegedly supposed by its Google rank that is
to say they behave as specialized ones!
Textons corpuses must be stripped off from “no content” information, like all type of coding before their
processing. Textons are the raw data of Knowledge Maps, at large a semantic sample of it however not yet
structured: hierarchically “flat”. Darwin as a process following its own Ontology Conjectures unveils in its
turn the potential concepts hosted here and there within textons as a function of their intrinsic statistical
“rareness”
Textons unveiling
Texton [pair LC; SMS; n; W]: pair Language Culture, Suspected Main Subject, amount of documents, amount
of words], for instance
Tx231 *EN USA; “modern theatre”; 12,324; 15,355,409+;
Read as: Texton 231 for the pair EN USA (English – American), dealing with “modern theatre” having a string
of 12,324 Web Pages corpuses and 15,355,409 words
The key is to detect potential concepts or “cepts” as a function of their rareness along the following steps:
1. Jargon confirmation and coherence tests performed on it, f.i.: EN - USA Art Jargon of ~4,000 terms;
2. SJD, Semantic Jargon Distribution: Statistical Jargon words’ presence within textons;
3. SJD rareness;
4. First potential single word concepts and/or “cepts” (c’s) List;
5. n-ads potential c’s Frequencies Database creation;
6. 1-ad presence and from 2-ad to 6-ad potential c’s presence distribution within texton: f.i.: the four
words {the backwoodsman of Kentucky} in allusion to Abraham Lincoln generates from 1-ad to a
4-ad: [the; the backwoodsman; the backwoodsman of; the backwoodsman of Kentucky];
7. HUMAN defines, adjust rareness thresholds;
8. Lists of potential c’s with their justification parameters;
9. Semantic checking of potential c’s “names”: searching their “modal names”;
10. Semantic checking “ex-post” of MS versus the checked potential c’s: do these c’s represent
semantically the initially supposed MS?;
Once finished this process step for a given Major Branch of the Human Knowledge, for instance “The Art”,
we may say that we have unveiled it completely but still unstructured, flat, as a huge logical tree of only one
level! Talking about The Art thesaurus of 7,570 nodes, each one corresponding to a single Major Subject of
the Art discriminated by about 400,000 c’s. Once structured, the last global Darwin step: these 400,000 c’s
show as structured in 13 levels!
Challenge II: Semantic Synthesis via Textons Processing
Given a cluster of documents supposedly dealing with the same subject unveil from it the best fit to its
specific set of concepts (ideally its “subject semantic fingerprint”). This is one of the strongest Darwin
conjectures that globally stand for: humans tend to register their ideas statistically following secular rules
(see Darwin Ontology) generating by de facto “WWD, Well Written Documents”. So Web pages dealing
with the same subject spin around these ideals like semantic vortexes: being the internal the best
documented meanwhile the externs the worst.
We humans are specially suited to unveil those specific concepts (see Darwin history) as a function of how
good a document is concerning the ideal: from our experiences with hundreds of advanced students of
Informatics and Systems Engineering a human in a couple of hours could be trained to detect specific
concepts within documents chosen at random about any subject. This “methodological talent” could also be
easily transferred to an agent (see How Darwin unveils potential specific concepts out of a document).
For each Th Semantic Threshold Level of “rareness”, in the figure above Th03, Darwin algorithms unveils a
specific set of 46 potential/suspected concepts supposedly pointing to the MS, Major Subject, of the texton
analyzed. If the texton have 10,000 text corpuses pertaining to their corresponding 10,000 Web pages we
would unveil for instance 500,000 potential concepts names for an average of 50. In fact a Big Data scenario
where in the average for each MS and for each Threshold we should define the “best fit” to the “specific
concepts set” used statistically worldwide for a given MS and for a given pair “language culture”.
Going a little deep on the details: from these 500,000 [URL, concepts] pairs, only for a given MS, we must
find the best fit to a sort of “modal” “specific concepts set” of it. To perform this task we may need the
following structured data: [MS, URL, {code, name}, frequency] where MS stands for Major Subject, URL by
the Web address, {code, name} set of pairs (code, potential specific concept name),and the frequency of the
potential specific concepts appearance within the page.
Given a MS of a branch of the HK the challenge is to unveil out of the Web the best fit for the “specific set”
of concepts semantically related to it, namely the set of concepts specifically used – probabilistically - in
WWD, Well Written Documents. In numbers for a branch of knowledge, for instance “The Art” (without
considering frequency):
4,000,000,000 names 
[8,000 subject names per Branch of the HK
x 10,000 Web Pages per subject name within each branch
x 50 potential specific concepts per Web Page ]
Note 02: This Big Data briefing accounts to have an idea of the order of magnitude of the computing needs. This will
provide us the upper threshold level, almost a “brute force” reductionist approach. However as it occurs in most Big Data
processes we human learn fast. In the examples above we may pass from a first trial of 4,000,000,000 names processing
for a single subject of a HK branch to no more of a few million as long as we go from one MS to another, for instance
from 4,000,000,000 to less than 4,000,000 with an average of 40,000,000. What really happens is that from the designed
10,000 Web Pages capture per MS we may “discover” among those 10,000 URL’s a small sample of a few hundred of
“authoritative” URL’s concerning the MS under analysis.
What’s then missing?
Only two things: iii) How do we unveils all the subject names of a given HK; iv) how to structure those
subject names along a unique logical tree.
=> Back
Darwin Methodology Applications
By Juan Chamero Principal Architect of Darwin Methodology, as of January 11th
2015
It is a dreamstime.com free use image. Search thru Google Images via query [teacher
clip dino] as open search.
Thesaurus is a sort of reference book of concepts, and in Darwin Ontology of “in mind ideas”
represented by one or more words of a given pair language culture usually with synonyms and
sometimes with antonyms. Thesauruses may suggest the best suited synonym for a given moment
(present) named as the “modal name” of the referenced “in mind idea”. The Web imagery used to
associate thesauruses with dinosaurs.
It is a vision of the upper levels of a HKM, Human Knowledge Map of The Art. See a
sample extraction of it sheets 1, 2 and 3, (Theatre Mapping) as of September 2008 from Spain.
HKM stands for mapping the whole knowledge or a branch of it, semantically, by meaning,
resembling Logical Inverted Trees from their “roots” down to their derived “nodes” thru a unique
ancestry. Its content and the “intelligence” behind it is detected and unveiled from the Web by
Darwin Methodology under the guide of Darwin Ontology, a set of “strong” semantic conjectures
about how we, humans, document our “in mind” ideas.
Non intrusive e-membranes like a sort of intelligent interface among two or more
autonomous applications. Each one provides some type of service to the others under a non
intrusive operation scenario. See BBC Science as of July 2003.
Non intrusive and non perturbing e-membranes are necessary interfaces to communicate two
different “words” that enable both to continue working autonomously, within their own hierarchy
and rules and without perturbing each other. The e-membrane designers must have into account
that the communicated systems may not only differ in objectives, times, rhythms, but also in their
semantic. In the example published by the BBC of London the figure refers to an e-membrane
between a conventional procurement SAP system (for an international oil corporation) and a
“pilot” e-procurement system working in parallel with the conventional one. The purpose of the e-
membrane was to learn as much as possible in the less time about real e-procurement instances.
SSSE, Super Semantic Search Engines, Semantic Direct Search Engines, also YGWYN – IOOC
Search Engines, You Get What You Need In only One Click Search Engines. See also “Súper
Buscadores Semánticos (i) y (II)”.
Initially the main purpose of Darwin was the creation of a Semantic Search Engine that enable
users to find in the Web the best information about something they need in terms of information
and/or knowledge whether possible in only one click. The icon for this Darwin application was
selected from “How Darwin unveils concepts”, see below.
Conventional Search Engines like Google are like a non semantic library where all existent Web
documents are classified by their words. Google tells you nothing about the meaning of Web
documents. Darwin Methodology unveil documents meaning at a given moment by structuring
them semantically and building the Web Thesaurus, resembling a World Wide Library, depicted in
the figure as a hypercube of as many “floors” as semantic levels the Human Knowledge has.
Encyclopedias building should be one of the first semantic areas of necessary Web
applications but unfortunately its development is almost frozen. There are exceptions like
Wikipedia, and projects about specific thematic subjects like Europeana and Wolfram Alpha. (See
Videos Google: World in hand).
By first time we, humans, have at hand all the records of our living, our past and now our present
practically at real time! Encyclopedically talking we have at hand (of course with the appropriate
technology) all the pieces of information and knowledge about anything. With applications like
knowledge mapping (the second of our list) we may locate “directly” the best authoritative
sources practically about anything. The only remaining task (by now the human touch of direct
knowledge retrieval) is to “synthesize” and “edit” the content of those sources meaningfully and
automatically, namely via a conventional computing process. Darwin Methodology is pursuing this
goal. In the interim Darwin may deliver to humans all the content they need specially suited for
editing.
Surveys and Polls: This icon corresponds to one of the classic applications: stats focused
in Surveys and Polls. See all types of visualizations in Google Images as: [surveys & polls results].
See image as applied to Health Science Strategies.
Darwin faces it singularly because having at hand information about anything even without
needing being aware of addressed people, entities and avatars. We may obtain meaningful
answers from querying to all types of just “observing” non intrusive procedures. We may also have
at hand “massive” information at any time and intervals of time about any specific aspect of the
research and the possibility of filtering observations thru causal cultural behavior models as well.
Intelligence Reports: This icon represents information and knowledge unveiling and we
used it as an avatar of “intelligence”. Darwin stands for Distributed Agents to Retrieve the Web
Intelligence supposedly disperse and hidden.
Intelligence: take a look at Intelligence in Google Images and appreciate the dominant imagery we
humans have about this subtle concept: light, luminosity, tending to be blue and expanding. In the
data evolution from chaos to information to knowledge to wisdom the thing or “thing” that
enforces evolution from chaos to wisdom is the intelligence. Intelligence Reports are documents
that enable us to “infer” results and consequences, trivial and sophisticated ones, explicit or
hidden out of a structured and as much detail as needed description of something existing.
Darwin enables us to create trustable Intelligence reports about ANYTHING as long as we have at
hand enough trustable information and knowledge about that ANYTHING. As in the case of Stats
Surveys and Polls, these reports could be performed at non intrusive mode and without disturbing
the Web.
I is an icon avatar from the film Avatar. Avatars are creatures and or entities that
represent existing entities, persons, beliefs, truths, etc. See avatares (Spanish) Pope Francis demo,
avatar(computing).
Avatar is an abstract entity and/or a digital or Web creature that intents to represent: a given
entity, person physical or juridical, the best way scientifically taking into account all possible
meaning axes of its “character” and facts even those considered good or bad, sayings of all types
from beliefs to conspiracies, researches performed on it for instance: the Pope Francis or Barack
Obama, or the Organized Crime in the World, etc.
I-Webs, Intelligent Web Portals: icon to represent how Web connections evolve along
time for the pair information people. See Explaining the Semantic Web and Making sense of the
semantic web. Be cautious! Most of these projections that intent to see the Web extrapolated to
year 2020 should be considered possible trends, among many, of a phenomenon that evolves
exponentially so fast and unpredictable that may mislead seriously our forecasts.
I-Web is a Website (or Web Portal) intelligently designed taking into account the available
technologies and the available resources of the Website owners and administrators. We may
found today excellent Web 1.0 Websites and from poor to awful Web 3.0 Websites. For instance
there are thousands of “top ranked” Web 3.0 Websites with their own proprietary data still
semantically unstructured. In any buildup the beginning is the beginning and the beginning in all
Web projects should always be its “semantics”, namely the “semantization” of its data and
vocabulary.
The Web creatures live in different habitats and under different technologies concerning their
abilities to communicate with others meanwhile performing their daily tasks in order to survive
either tagged as Web 1.0, Web 2.0, Web 3.0 and now very recently as Web 4.0. Concerning Web
development we have as options a paraphernalia of software panaceas and tools that are usually
applied to Top Websites and Portals and that have strong prerequisites to work successfully such
as Distributed Search, Cloud Services, Mobile Interfaces, Social Networks interaction, Privacy
Protection, Top Down and Bottom up design, Big Data Apps. Implementing advanced applications
without satisfying those prerequisites is suicidal.
Notwithstanding we may transform any Website in an I-Website without trying to become it Web
3.0 or Web 4.0. First of all as commented above we must semantize as much as possible data,
vocabulary, and naming. Then updating and or replacing of programming platform, languages,
reviewing databases structures having a horizon of planning of not less than a decade.
Big Data Synthesis:. Semantic Web needs of Big Data; it’s intrinsic to its nature. Its
market is enormous and growing at a pace of 10 percent a year without taking into consideration
yet video and audio content. From de facto Darwin works within Big Data scenarios and have an
extensive experience on it with proprietary procedures and algorithms. The image icon has been
selected from one of the typical and oldest Big Data applications. See original image of this
impacting Big Data application at CERN Server, Switzerland (LHC Large Hadron Collider).
As Professor Mark Whitehorn says “Big Data may be misunderstood and overhyped - but the
promise of data growth enabling a goldmine of insight is compelling. Professor Mark Whitehorn,
the eminent data scientist, author and occasional Register columnist, explains what big data is and
why it is important. And adds: “Data is not large and it is not small It does not live and it does not
die It does not offer truth and neither does it lie.”
In our humble opinion BD always existed. We do not believe that it is a new data dimension but
something that at a given moment of our knowledge presents as rare, too big and complex. Let’s
clarify within our Darwin Methodology: 40 years ago matrix processing was bound to a volume of
data 100x100, something that holds in an Excel sheet of 100 columns by 100 rows. Why? Because
rounding errors propagation. Imagine now volumes of data generated by some social networks
about the order of 1,000,000x100,000, for example applied to a behavior study of 1,000,000
persons related to their opinions about 100,000 themes! In order of not over dimension the
resources and tools to face this BD challenges we need fundamentally experience and common
sense (see our document about Big Data Challenge).
: Autonomous Artificial Communities: as of today is easy and not too expensive to create
this type of communities. They could be a virtual and idealized emulation of a real one to be
tested its evolution (see artificial islands in Lonely Planet and Laulasi Islands as per Wikipedia).
These communities could be populated by persons, avatars, agents and a combination of all them.
Searching in Google by [artificial communities] as open search you may find Synthetic Microbial
Communities, Google AI Communities, Artificial Reefs Communities, Planned Communities,
Artificial Foraging Ant Communities, Artificial Plant Communities, Artificial e-Learning
Communities, Artificial Fresh Water Protozoan Communities, etc.
=> Back
The Web as “seen” by Darwin Methodology
By Juan Chamero, from Buenos Aires, as of 5
th
of November 2014
The Word of People
Humans have millions of ideas in their brains
Hidden and elusive except by proper use of language
Figures below illustrate very simplified how to build via agents a HKM, Human Knowledge Map, from a
“semantic seed”. Darwin states that in the Web is “always” hosted the sum of the knowledge even though
unstructured and disperse in approximately 35,000 millions of Web pages as of today. Darwin Ontology
states that we humans keep (save) in our brains a finite universe of “in mind” ideas estimated in 12 to 20
million per pair “language - culture”.
This cognitive asset has being coined thru millions of years and documented since the writing discovery.
However this asset looks like hidden and elusive. The only way we have at hand to retrieve those ideas is
thru the “correct use” of language within the right context, for instance the word Rigoletto point “correctly”
to the “Opera Rigoletto from Giuseppe Verdi within “Performing Arts” context and within Opera context
staying semantically differentiated from hundreds of acceptations and/or frequent uses of the same word as
for example a commercial brand or a restaurant.
Darwin Ontology was conceived to retrieve information and intelligence out from big data reservoirs
somehow semantically structured (probabilistically). However the Web content is only indexed by “words”,
not for ideas or concepts, being considered “semantically unstructured”. In order to “see” its content as if it
were semantically structured Darwin query it as if it is!. It uses as a valid stratagem the following set of
suppositions:
1. ALL: Web Completeness: The All is present in the Web notwithstanding disperse and hidden;
2. STRUCTURE: Logical and Probabilistic Completeness: The All is “probabilistically structured” under
logic algebraic forms;
3. CREATURES: Web creatures: from this structure arise dominant ideas (in mind ideas) humans use
to communicate between them;
4. NAMING: Modal names: dominant ideas have specific names, “unique” and dominant for each pair
language culture;
5. DOCUMENT: “The Word of People”: all Web documents are expressed around these ideas just by
using their specific names, their synonyms and/or their distortions;
6. TREE: “The Word of People” structure: dominant ideas are hierarchically structured tending to
evolve and conform as inverted logical trees;
7. FUZZYNESS: the nature of this structure is probabilistic and also its math logic;
8. EVOLUTION: This structure evolves: it evolves fast along time from seeds and/or graphs of words
and concepts pertaining to diverse disciplines. New disciplines are continuously created and some
others disappear. New branches and new concepts are cognitively detected and assimilated;
9. ANTHROPIC: Content and Structure could be retrieved: even being the Web space open,
practically unbounded, and continuous could be precisely mapped as HKM, Human Knowledge
Maps notwithstanding its dispersion and invisible “structuration”. This feature enable humans to
continue thinking and evolving as “always”: freely, not too much structured , neither tied up to
forms nor to extra semantic inflexibility;
10. ANTHROPIC: Web Thesauruses: These maps will catalog the unstructured Web probabilistically,
notwithstanding open, multilingual, free, in continuous evolution, and even more they will enable
us to “see” any Web Page conceptually and to compute an approximation to what would be its
“metadata”;
Darwin versus a Conventional Search Engine like Google
SE models ALL Structure Creatures Naming Document TREE Fuzziness Evolution Anthropic Anthropic
Agree E Words Apathy Apathy Apathy Apathy Agree Apathy Apathy
Agree E Ideas Specific Specific Specific Specific Agree Crucial Crucial
This table highlights how Darwin and Google “see” the “Web Ocean” from a semantic point of view
focused on its structure and its “creatures”: Darwin sees it like semantically structured (or may see
it as it were thru a map) meanwhile Google sees it as unstructured; Darwin sees the Web like a
huge ocean where “ideas” live meanwhile Google sees it as a huge ocean of “words” instead.
In the figure above we see an idealization of “The Art” semantic seed built year 2009 to be the nucleus
corpus demo of the projected European Union Search Engine (Theseus). This seed, once checked its
semantic coherence, was made grown by a Darwin anthropic algorithm were the massive “Big Data” type
computation was performed by agents and crucial decisions performed by humans: from status [40, 0] 40
branches, 0 specific concepts (almost a Zero knowledge condition) to status [7,571, 370,000] 7,571 branches
or Major Subjects of “The Art” and complemented by 370,000 concepts, about 50 specific concepts and
dominant names per branch in the average. Updated as of today, October 21st
2014 where G renders
497,000,000 Web pages for exact query “the art” the same seed would grow approximately to status *8,000,
400,000].
Darwin checked the 10 Web premises accomplishments as well their associated 10 Darwin Ontology
computation conjectures and started a mega process of 82 steps reviewing semantically hundreds of
millions of Web pages to finally synthesize The Art Map. These maps may evolve either by themselves or
under human control tending to its structural equilibrium, generally as a perfect and inverted “logical tree”.
The Web metalinguistic premises could be described as follows:
The Web is a complete entity logically structured (1) and (2). Its “creatures” are abstract entities hosted in
an immense and practically unbounded “Web Ocean”: what classic philosophy considers “concepts”. Web
pages are human documents, concrete entities (3) thematically dealing with ideas identified by their
“dominant names” (4) for each pair “language culture” via a literary knowhow coined throughout millennia.
This millennia knowhow (5) consists in describing objects, ideas, topics, and justify beliefs and truths by
using “minor and/or derived and preexistent ones, creating new ideas with a minimum of effort. This
derivation evolves to an arboreal structure (6) of hierarchical meanings: from mother ideas derive sister
ideas and link minded ideas tend to group under a common ancestry.
The nature of this unconscious and collective structuration is probabilistic (7) evolving continuously (8).
Under these premises the best we can do to “see more and better” the knowledge hosted in the Web is to
map it for a predetermined level of resolution, for instance a Web Thesaurus certifying that their names are
modal with a probability of 99% for the pair “English American”.
Techniques like Darwin in a second step may go deeper and excelling trying to make the Web more semantic
building for each page of it a suspected metadata and even dare to make descriptions and issue opinions but
that would be a contradiction to its premises. Under this spirit Darwin does not suggest content (9) but
strive to facilitate and make efficient the search of information and knowledge saying humans: Here in
within this immense Web Ocean we guide you to find the best Web pages (a few most times retrieved via
only one click) dealing authoritatively and semantically with your “in mind” ideas.
Note: The spirit of this guidance, from the beginning of the Web creation was within the “up to you” meaning: Darwin agents select
what considers the best pages but inspecting and/o analyzing them or on the contrary rejecting them is up to you.
Finally Darwin enables us to build and retrieve Web Thesauruses, and Human Knowledge Maps (10) that as
aged wine improve and essentiate by itself. And what it is perhaps more important: it permits us to build
trustable and high quality Intelligence Reports about almost any subject.
Global Vision: An open Web, created by and for humans, free, with maximum diversity, reasonably
structured, slightly imperfect even though striving for excelling, extensive us of form but not reducible to it.
Meaningfulness Appendix (Google exact search)
“The all”, 46,000,000
Completeness, 20,300,000
“The Word of Man”, 14,600,000
“Modal names”, 9,510,000
“The Word of Humans”, 299,000
"Dominant ideas", 101,000
“Logical completeness”, 20,000
"In mind ideas", 18,100
=> Back
Darwin Bibliography
By Juan Chamero, January 2015
E-books
o Semantic Web - The Human Knowledge Map, The Web as seen by Darwin Methodology, document
published in PDF and EPUB of 500 and 690 pages respectively
o The Web of the People, document published in PDF and EPUB of 250 pages.
o Darwin Human Knowledge Maps, a 75 slides series published by Google books.
Some Websites and Darwin Links
o Intag.org, http://www.intag.org, experimental Website that since year 2002 hosts the first Darwin
Web Thesaurus. Its use was shared by our Web development Darwin Team working conjunctly with
some Latin American universities and associated Labs.
o Juan Chamero Curriculum Vitae of the Darwin Methodology Founder and Principal Architect,
http://www.intag.org/downloads/.
o Darwin Blog, http://juanchamero.com.ar, a primer about Darwin in Spanish and English.
o Intag, Intelligent Agents Internet Corp, http://www.intagsolutions.com, Corporative Darwin
Support and Rights Proprietary.
o Darwin Ontology, http://darwin-ontology.org, the first e-book draft as of year 2008 about our
ontology and its conjectures. It documents most Darwin conceptual changes along its 15 years of
life (2001 – 2010).
o Aiware, http://aiware.com.ar, experimental i-Web, Intelligent Website where Darwin Applications
are promoted and tested (updated January 1st
2015). See: Big Data Pills, an experimental Big Data
series (27).
=> Back
A picture is worth a thousand words
Old adage deserving 36,000,000 references in Google July 3
rd
2014
Our knowledge is based on “concepts” that are “in mind” ideas we have
created and make them evolve remaining invisible for others!. The main way
to make them knowable by others is by “naming” them via words of a given
language within a given culture.
The same idea may have many names, most times similar and equivalent,
however only one is dominant at a given moment. These dominant names or
“modal names” are detected and retrieved from the Web by Darwin agents.
The other way to communicate ideas, from ancient times, is by attitudes,
gestures and images. However we are not used to manage these forms of
communication with the precision of words. On the contrary we used them
to transmit “meta messages” to make others understand and see better the
context where the idea exists.
We are in the beginning of an era of communication via images in our daily
lives. Darwin is making its first experiences on this peculiar way of
communication finding that big enough samples of similar documents share
not only a specific set of concepts but also a specific set of images types and
traits!.
Another experimental fact is that the “associated” images sets help users to
open their minds in order to discover for each of them more semantic axes.
The images in the left column where selected by a human trying to explain
others what he understands by “images worth more than a thousand
words”. See how chaotic, subjective and discriminate it looks like. Darwin
agents are continuously trained and tuned up to retrieve significant samples
of images for each thematic node.
Note: This demo only includes Darwin images sampling for the main art
themes, the ones that head changes in color tonalities of this Vision 1
mapping, namely: 0.1.1.1.1. Painting, 0.1.1.1.2. Drawing, 0.1.1.1.3.
Sculpture, …..Users are enabled either to search in Google by one of the
images of the gallery or either query to Google Image first to retrieve top
related images. See The Art Tree Darwin demo.
Darwin Methodology
A secular way to build the Semantic Web
And Semantic Search Engines
Intro to Darwin Art Map
Juan Chamero, as of January 1st
2014
Darwin Art Map, Vision 1, go to demo
Darwin Art Map, Vision 2, go to demo
Intro to Darwin Map: Please read carefully the intro that follows postponing the understanding of
the two images above that will guide you to navigate “The Art” region of the Web thru a Darwin
demo. You have to be aware and to understand two things first: a) that the Semantic Web does
not exist yet and; b) why do we need maps. As you go deep this intro, from time to time take a
glance to the figures. A tip: try to find an analogy with the bonobo keyboard story below.
The Semantic Web: Along 13 years from the end of the “dot com Era”, more precisely since the
“dot com bubble” collapse, we (as Darwin Team) were trying to find a way to go to the Semantic
Web. Of course most of you believe that we are immersed in it but that’s absolutely wrong: The
Semantic Web does not exist yet! Most Web data are semantically unstructured and only indexed
by words not by “meaning” and you all know the difference: meanings are not words, but ideas
expressed in each language by a precise words or strings of words and/or symbols culturally
agreed along the time resembling ideograms.
The Semantic Web Illusion: However actual conventional search engines are so powerful that
awakens in us the illusion the Web is semantic. A real Semantic Web will enable humans to
communicate among us - accurate, meaningful and fast- directly in any language and to retrieve
from the Web, also directly (in only one click), any piece of information, knowledge and
intelligence.
Kanzi: a big ape (bonobo, a pigmy chimpanzee): a semantic bridge towards the Semantic Web
See also “Apes with Apps”, by the IEEE Spectrum
Darwin Methodology enables humans to “see” the Web like structured by meanings based on an
anthropic ontology (Darwin Ontology) that take into account how we humans express our ideas
thru text, images, sounds, tactile sensations, or ideograms like Chinese and Japanese ones.
Google Hummingbird disclosed: Google, the most advanced actual conventional search engine
have recently launched its new search algorithm named hummingbird because it behaves - precise
and fast - just doing the best we humans know about the art of Artificial Intelligence and
Computing applied to the Web. However it is not well suited to communicate ideas because it is
still based on words an inferior semantic category that ideas and concepts.
Kanzi the bonobo: We are going to introduce an astonish research about communicating at
human style via ideas with Kanzi a bonobo (see figure on top). This research and Darwin
applications imply in fact the use of more advanced technologies that the ones used by
conventional search engines like Google and Yahoo. They only need time to broadening their
reach and extend their use because they add a new dimension: meaning instead of pieces of it.
The paradox: the best ways are sometimes hidden in humble scenarios: The differences between
Darwin and Yerkish an artificial language developed by Georgia State University for use by non
human primates thru a keyboard with “lexigrams” representing ideas are basically quantitative:
Darwin manages about 15 millions ideas per language meanwhile Yerkish has no more than 600
but both work on ideas. Take into account that the actual urban glossary of our children has no
more than 1,000 ideograms (expressed by words). As we are going to see in our demo about Art
Map the users at large communicate thru a sort of virtual e-membrane keyboard of 7,570
ideograms. Of course in both cases: the bonobo and humans must acquire a basic skill.
Experimental Kanji 128 lexigrams keyword, see Art for bonobo hope
See also The Darwin Art Map of 7.571 “keys” (ask for a login)
What do the SEO actors say? Successful SEO, Search Engine Optimization professionals and
companies agree that pushing Websites on top of search engines outcomes based fundamentally
on popularity (link wise) is an old model that proved to be a solution when there was not many
answers about a theme but now is different: answers proliferate by millions and at the same time
users know that somewhere “up there” exist at least a document that satisfy their queries. Efforts
to locate the right answers are not only justified but from now on become a necessary condition.
Source: The future of search: A Window to knowledge
The search by keywords is collapsing: The Peter Principle: The figures below express a synthesis
of many Web future visions about this possibility performed since 2008 when the problem of Big
Data was still not perceived and unimagined. Only recently begun to appear essays and articles
daring to forecast the end of keyword searching, its collapse that would affect a very significant
and global market perhaps the biggest one within Internet. Half seriously and half humorously the
Peter Principle asserts that physical and juridical persons tend to progress (or to be promoted)
until they reach their position of maximum incompetence. Many outstanding ICT corporations are
close to that critic point: a Web pollution of too much misleading, noisy and nonsense words!; a
Web of too much alchemy!.
Source: Nova Spivack, a technology futurist (May 2008)
Source: Peter Principle, by futurepredictions.com;
Other efforts along same or similar lines - at large how to build a Web Thesaurus -:
o Google trying to unveil what people say: global pseudo semantic approach;
o Yahoo trying to enter into semantic search: semi structured semantic experiment;
o Wolfram Alpha improving in direct search: on highly structured data;
o WWF trying to structure their communications: semantic approach, only experimental;
Hummingbird, “precise and fast”, the new Google Search Algorithm
Google unveils (Nov 6 2013) its Hummingbird algorithm
Yahoo Lab teaching Semantic Search (July 23rd
2013) in Bangalore;
Yahoo Semantic Search Tutorial
Wolfram Alpha extends to mobile
A multilingual semantic experiment performed (2008-2010) by CORDIS EU
=> Back
Darwin Methodology
Mapping: To see more and better the Web
By Juan Chamero, Darwin Principal Architect, as of 21
st
of February 2014
A map is a visual representation of an area – symbolic depiction highlighting relationships
between elements of that space such as objects, regions and themes.
Along centuries we humans made use of maps and mapping to see the world more and better.
Unfortunately we did not start to use them to see more and better the knowledge still hidden in the
Web, simply because the web has not been semantically mapped yet! Darwin may do that building
precise maps of knowledge to perform three basic and crucial surviving and evolving activities: 1)
To Learn; 2) To navigate thru knowledge for fun; 3) To search what we need.
This brief report intents to be aware ourselves about this peculiar loss of historical continuity via
an images sequence from Late Neolithic times to present talking into account that as of today the
Web is only indexed (mapped) by words searching what we need by “guessing” (activity 3) and
exercising in a very limited way activities 1 and 2.
1. Late Neolithic: from Late Neolithic humans used maps and mapping depicting them in stones,
caves and trees.
Neolithic mapping, as of Eupedia
2. Altamira Paintings: These maps were the “knowledge maps” of those times
Altamira Paintings, as per Wikipedia
3. Medieval Times: “T-O Maps” and “Mappa Mundi”: as of circa 1300 depicting the world seen at
that time, the Earth (T for Terrum) within incommensurable circle (O for Orbit). The image below
depicts a Tripartite Hereford T-O version with Asia upside, Europe down left, Africa down right and
Jerusalem in the center. It measures 158 cm by 133 cm some 52" in diameter and is the largest
medieval map known to still exist. Focusing into details some well places like the British Isles are
reasonably well described.
The Hereford Mappa Mundi (~1300), depicting Jerusalem in the center
4. Today - DNA Mapping: the image below shows us two DNA visions. At left Gene map of the
human leukocyte antigen (HLA) region. The major histocompatibility complex (MHC) gene map
corresponds to the genomic coordinates of 29 677 984 (GABBR1) to 33 485 635 (KIFC1) in the
human genome build 36.3 of the National Center for Biotechnology Information (NCBI) map
viewer. At right DNA replication is the process of producing two identical replicas from one original
DNA molecule. This biological process occurs in all living organisms and is the basis for biological
inheritance. DNA is made up of two strands and each strand of the original DNA molecule serves
as template for the production of the complementary strand, a process referred to as semi
conservative replication.
A tiny HLA region of the Human Genome DNA Replication
5. Today - Brain Mapping: The figure below depicts a mapping of both sides’ main brain functions
by HiddenTalents.org. These maps are clickable: if you make click on “grammar” (Left side) you
obtain the following information:
Grammar
Grammar is the spatial sense of vocabulary. This is especially true of English, which developed a relatively
simple grammar system that depends upon spatial order much more than endings or gender. In English, we
have grammar in our left brain that knows "Boy chases kangaroo" is different than "Kangaroo chases boy."
We could also draw pictures in our right brain to symbolically say the same thing:
Left brain words = "Boy chases kangaroo" "Kangaroo chases boy."
Right brain images =
As a child grows, the brain soaks in whatever sounds it hears which we call vocabulary and grammar. After
age 10, the vocabulary and grammar parts of the brain are mostly finished growing and the thinking parts of
the brain in the frontal lobe continues growing, building upon the foundation of grammar and vocabulary
learned in childhood.
Vocabulary => Grammar => Concepts => Creative thinking
Brain functions mapping, as per HiddenTalents.org
6. Today - Dashboards (To take decisions): all kind of organizations, either public or private make
extensive and intensive use of “dashboards” that depict the state of the main variables that
govern a given activity and/or equilibrium conditions, namely: businesses, crisis, competitive
scenarios, Web traffic and uses, people behavioral trends, and conflicts, at large forms of
Intelligence reports to guide and optimize our decisions.
Example 1: SquizLabs
Example 2, a Healthcare dashboard
Example 3, as per Microsoft aids
Example 4: a dashboard to locate d-experts
7. Today - Defense: in a Web where EVERYTHING is connected with EVERYTHING and EVERYBODY
is connected with EVERYBODY at extreme detail: opinions, emotions, impressions, and even our
gestures, it is perfectly possible to scientifically unravel possible futures of any scenario no matter
its complexity. Among top reach and complexity are “Defense Mappings”. The image below
depicts the China Military Power as of 2009 as per the prestigious Map Collection of Perry -
Castañeda Library at University of Texas.
Defense Mapping Example 1: China Military Power as of 2009, UofTexas
8. Today - Crime Mapping: Maps of crime for any type, place or situation, for example in the figure
below a demo for Brent Borough, London UK, by Instantatlas.com.
Example of Crime Heat Mapping
9. Today - Thematic - Nat GEO: combining the power of mapping and interactive control we may
see below a “Nat GEO” (National Geographic) educational Website, in this example learning about
“Physical Systems” - Water - Ocean Surface Currents and Light at Night onto clickable maps to
focus our interest. We invite you to use these “open and free” outstanding educational facilities.
10. Today - Google Maps: The use of geo maps has extended universally, for all ages, genre and
socioeconomic condition thanks to the Web and specifically to search engines like Google and
Yahoo. Google maps are open and public and as such even their best approximation should be
considered of “medium resolution” where a land square of 15 by 15 meters is represented in one
pixel. Taking into account that satellites networks could “inspect” the earth surface at 0.6 square
meter resolution per pixel (and deeper than that but this data is publicly unknown) we may
imagine a world structured in levels of something equivalent to “security” where possibly some
organizations like the NASA are enabled to monitor the world at “level one” meanwhile some
others like Nat GEO and Google at “level two” and you, we and all “common people” at “level
three”. In the specific case of geo mapping there are level one providers that probably manage a
superior “level zero” but limited to very specific themes, like for instance TrueEarth with its
TerraMetric vision and derived products.
TruEarth® 15-meter imagery is the baseline for global, natural-color, Earth imagery. A complete,
3.6 Terabyte, global mid-resolution dataset, TruEarth® 15-meter imagery is ideal for web mapping,
simulation, visualizations and GIS applications. TruEarth® 15-meter imagery provides complete,
best-available, substantially cloud-free, global coverage (except Antarctica) of the Earth at 15-
meters-per-pixel resolution.
New Trends in Global Earth Mapping, as per TruEarth
Epilogue: Darwin may map all these examples unveiling available data from the Web. As
always the beginning is the beginning: In order to perform 1) learning; 2) knowledge exploring;
3) semantic search we need first to Map the Web. Darwin does it! => Back
Semantic Web
Darwin Semantic Search
Instructive demo
by Juan Chamero as of 1st
of January 2014
Semantic Web: Before introducing you in Semantic Search you need to know first the meaning of Semantic
Web and of Web Semantics as well. The only precisely defined term is the first, coined by the Web creator,
Tim Berners Lee: also The Web of Data as synonym of Semantic Web instead of the actual Web of
Documents (Web pages). Web Semantics is rather ample and ambiguous: Methodologies, tools, agents,
programs to “see more and better” the Web and to structure it meaningfully. The Web as of today is then a
Web of documents, semantically unstructured, only indexed by “words”. As such we may sustain that the
Semantic Web doesn’t exist yet!
Respect to these crucial themes the Web of Data was idealized like a huge Web reservoir of “structured by
meaning” data, still a utopia. However it is possible to build a sort of “semantic glasses” to see the Web
as_is today but meaningfully structured! Darwin Methodology enable us to build not only glasses to “see
more and better” the Web but to structure it gradually. They have a built -in map of the whole Web Semantic
structure enabling us to locate directly pieces of information and knowledge and at the same time to use the
whole Web as a Web World Encyclopedia thousands times greater with comparable authoritativeness that
conventional ones.
Visualization of the upper piece of the Art Map
Towards Map Visualization: This demo intents to present you a small but meaningful piece of the above
mentioned Web Semantic structure focusing on a single discipline of the Human Knowledge: The Art of the
World in English. Any human solution has pros and cons: the Web of today was easy to implement
unstructured. As a result we have today more than 35,000 million Web documents only classified by words,
not by meaning. Semantically this state of chaos is partially alleviated by the aid of powerful but invasive and
costly conventional search engines like Google. Another solution is the one we are proposing: a sort of e-
membrane to see the Web more and better, as if it were structured, via Knowledge Maps and semantic
glasses that guide us to locate pieces of knowledge instead of words. Of course this second alternative
requires a minimal knowhow of interpreting semantic maps!
Art Map: The figure above shows a rectangular matrix schema of a piece of our Darwin Art Map. You may
imagine it like displayed within an Excel sheet of 23 columns by 326 rows. This “Cartesian coordinates”
display is arbitrary only accommodated to depict to the human eye meaningfully Art trees classified as a
structure of hierarchically embedded clusters. Each tonality change delimits clusters, for instance painting
from drawing and fashion from cinematography. Finally we humans are used to “see” the world thru
windows!
Let’s make a first real size exploration of our demo. What we observe is the Darwin Art Map Skeleton,
namely a rectangular matrix of 23 columns by 326 rows where each cell points to any of its 7,571 subjects or
themes. Most HKM, Human Knowledge Maps are represented by Inverted Logical Trees from their roots
to leaves thru derived branches and nodes. We as users may use these maps for two main purposes: a) to
know as much as possible and directly from it or; b) as a precise pointer to obtain via conventional search
engines, like for instance Google and Yahoo, the best information and related knowledge existent in the
Web, also directly, in Only One Click. The first purpose is equivalent to study via World Web Encyclopedias
and the second is equivalent to use conventional search engines like if they were SSSE, Super Semantic
Search Engines.
Warning: The visualization and use of these maps fall by “de factum” within Big Data. As we are going to see The Art in the World, has
7,571”big ideas” and each of these “big ideas” has related, in the average, 50 “minor specific ideas”, totaling a micro semantic cosmos
of about 400,000 ideas or concepts. Let’s then focus our attention about how to present visually two “orders of magnitude”: 7,571
skeleton nodes and 400,000 concepts.
Direct and Naive Vision
The Art Map in full: the map as a rectangular “matrix” of 23 columns by 326 rows was our first visual
approach. Each “cell” represents a branch or a node indexed “by path”, from the root in the upper left corner
to leaves going down and from left to right. Let’s visualize the upper part in blue that corresponds to the first
Visual Arts sub tree: Painting that starts at “Painting” and ends at “Painting market” with 211 nodes.
Step 1: Mouse over any node: the figure below shows a window that prompts making mouse over any node,
in this case over Painting, depicting its neighborhood: its ancestor node (Classic), 11 son/daughter nodes
(descent) and 14 brothers/sisters or collateral nodes (same level).
Step 2: Making click over the node name: In the image below we may see the window that prompts when
this link is activated over “music” cell. Could you locate it? Be a little patient See Step 2 guide.
Warning: The nodes content for this vision is incomplete. The Semantic Skeleton needs to be complemented with the 400,000 associated
concepts that are not yet activated. Darwin agents have retrieved them but the process is still under revision by a team of human
experts. Notwithstanding as we will see along this brief the semantic search operates near ideal efficiency.
Step 1guide: In the figure above the node Painting, one of the main Visual Arts sub trees, has as its “father”
the node “Classic” making reference to the Classic Visual Arts (to differentiate from the Non Classical ones)
and 11 “sons” as you may easily check. If you go a little deep the window will show you up to 14 “collateral”
or “brothers” sub trees belonging to the same Painting Tree. We may also depict in these windows the whole
“neighborhood” of any node including “uncles” and “grandfathers” (it is implemented in our Second Vision).
The figure below corresponds to “music” a little hard to locate within 7,571 options isn’t? Imagine within 15
millions!
Step 2 guide:
i) Fine tune-up: As a second step users may select advanced options making click on the name: we have said
that for each node of the Art Map Skeleton we may have associated an average of 50 “minor ideas”,
generally from 10 to 200. These ideas or concepts are classified in two clusters: Generic Concepts of the
same nature of the node subject and Object Concepts, very specific ones such as names, acronyms, dates,
codes, etc. This option is actually in process of revision and activation. However only with the skeleton node
information it is perfectly possible to obtain thru conventional search engines a meaningful query outcome
without thematic ambiguity and minimal noise in the FIRST CLICK.
ii) i-URL’s: Learning a little from the ART MAP itself: when building the Art Map Darwin agents and
algorithms edits for any node a “Semantic Sample”, extracted of “suspected” and relevant authoritative
sources (URL’s) in the node matters. These “briefs” (or i-URL’s or Intelligent URL’s) are pieces of knowledge
that should be reviewed and approved by humans. Any Darwin Map once finished have from 1 to 5 i-URL’s.
Users may have a basic learning by reading these briefs and browsing their corresponding URL’s.
Warning: For your information we have included in this demo and for this vision a sample of four nodes with
all the tune-up and i-URL information complete. Tip: This time try to locate “Lyric soprano”, the target
neighborhood, via the search box
I the figure above, making mouse over “Lyric Soprano” node (violet) as a first step prompts a window with
the following information (see image below)
That tell us that Lyric soprano has Soprano as father and two sons: Light Lyric soprano and Full lyric soprano.
These four nodes have their second step information complete as you may check it and it is shown below.
Lyric soprano: images 1, 2 and 3
Soprano 1, 2, and 3
Light Lyric 1 and 2, no 3
Full Lyric 1
iii) Search by images: For any node of a map (in this demo 103 of the upper Art Map levels, Vision 2) Darwin
agents select 5 images supposedly strongly related to the nodes themes. These images could be used to add
meaning and reliability to the search via a sort of “Inverse engineering process”. You as a user may access
complementary semantic information making click on any of the images sources. We are going to explain
you why: Darwin Methodology is based on Darwin Ontology that tell us that we humans when document
something, for instance a intentionally written Web Page, follow at large statistically, certain forms and
protocols of a coined along centuries “Established Order”. On the contrary, or better said complementary,
when someone queries directly the Web thru images via a conventional search engine outcomes guided by
“people’s preferences” are obtained instead. We have checked our demo gallery (515 images) using Google
Images with 95% of successful matchmaking. It seems that even Google captures our reduced size images
version (300 pixels wide).
iv) The ART Map in full search box ( see the image below): the search box in the upper corner may be used to
locate nodes that have within their corpus certain words: for instance “music” is present in 74 nodes, “lyric
soprano” in 1, and China in 26, however all forming part of different concepts. The nodes are activated in
red. This box will acquire its maximum of utility when the whole Art Thesaurus is implemented. Take into
account that a Web Thesaurus is a list (Controlled Vocabulary) of all concepts belonging to a given
discipline. The Art Map in full will have about 400,000 concepts structured along the Art Map Skeleton
discussed here.
Warning: once the skeletons for all knowledge branches are extracted from the Web as_is the retrieval of the
whole Web Thesaurus, about 15 millions concepts per language, is reduced to a Big Data computational
problem: in terms of effort no more than a few weeks of parallel processing. The semantic skeletons are the
intelligence that maps information into knowledge!
Node neighborhood: we have defined the neighborhood like a “family”, in this case a semantic family of
entities that share a culture and a language. To locate them in our representation scheme is not direct, we
need of a script. Our rectangular representation permits two classical forms of classification: by track and by
level. By track nodes along the same “track” are listed contiguously until a leave is found. As we are enforced
to somehow package all the tracks (or levels) within a rectangle the neighborhood order is lost. For this
reason we offer in our two visions access either to the sub tree that hangs of a node and to its family.
A given node neighborhood: Let’s explore again the neighborhood of the target node “Lyric Soprano”. It is
deep within the green region corresponding to “Performing Arts”. In the figure below its corresponding sub
region where “Soprano” is the ancestor node and “Light Lyrics” and “Full Lyrics” are the two sons is shown.
Let’s begin with the target node “Lyric Soprano” then. Passing the mouse over it you access to its
neighborhood, namely (see below): ancestor: “Soprano”; brothers (5): “Coloratura Soprano”, “Soubrette”,
“Spinto Soprano”, “Dramatic Soprano”, and “Wagnerian Soprano”; sons (2): “light Lyric”, “Full Lyric”;
Some Trials
Trial 1: Lyric Soprano: the “Wizard” agents of this demo query Google by the following string:
Darwin  *"arts" "performing arts” theater opera “vocal classification” female soprano "lyric soprano"+
Obtaining 952 references, most of them authorities and semantically valid. On the contrary, not using our
semantic interface, querying directly to Google by:
Google  [lyric soprano] as an open search  817,000 references
Google  *“lyric soprano”+ at exact mode  217,000 references
Trial 2: Michelangelo:
Darwin  ["arts" "visual arts" sculpture history "ancient rome” renaissance Michelangelo+
Obtaining 606,000 references.
Google  [Michelangelo] as an open/exact 10,400,000 references
Trial 3: “Music philosophy”:
Darwin  *"arts" "performing arts” music "music education" "music philosophy"]
Obtaining 55,200 references.
Google  [music philosophy] 230,000,000
Google  *“music philosophy”+ 675,000
Trial 1 bis: “Lyric Soprano”: (another day and adding closely related concepts one at a time):
Darwin  adding kiri Te Kanawa  3 references
Darwin  adding Mirella Frent  3 references
Darwin  adding Victoria de los Ángeles: 1 reference
Warning: the user may experiment for each node the influence of the related concepts. Generally generic concepts do not contribute too
much to focus the search. On the contrary most objective concepts do.
Simplified Global Vision (see “Calligraphy” on Art Map)
Art Map Simplified Vision: you may see the seven main clusters and within each their derived
descents (sub trees) totaling 103 clusters equivalent to equal number of books of a World Web
Encyclopedia.
.
Calligraphy sub tree (clickable)
Calligraphy Neighborhood (clickable), pointing to Chirography
Node knowledge synthesis (not fully activated yet), check “image searching”
Trial 4: Calligraphy and Calligraphy Chirography as an open search
Darwin  [calligraphy] as ["arts" "visual arts" calligraphy] 457,000 references
Darwin  [Calligraphy Chirography] as ["arts" "visual arts" calligraphy "chirography"] 776 references
Google  [calligraphy] 15,700,000 references
Google  [calligraphy chirography] 272,000 references
Darwin/Google (via an image retrieved by a Darwin agent)  114 references
Some conclusions: From our experience only using the semantic skeleton of any knowledge discipline,
namely art, medicine, computer, sports, we access without semantic ambiguity and without semantic
pollution to a more precise and focused universe from a dozen to thousands times small. Reinforcing our
search via the concepts of the Web Thesaurus we may locate what we are looking for in only one click within
a very specific, authoritative and tiny set of references. Finally we may take profit of the image search
facilities provided by some conventional search engines like Google to enrich our knowledge with what
people see and think about what we are looking for!
Life cycle of HKM, Human Knowledge Maps: all maps, their skeletons and their conceptual content are living
creatures that age and turn obsolete as time passes by. Everything from their arboreal logic to their
meanings changes constantly. For these reasons Darwin Maps may evolve by themselves and more than
that: strive for excellence. Darwin Maps versions backed up by statistics and reviewed by human experts are
generated from “semantic seeds” created by humans and make it grow by Darwin agents and algorithms. In
despite of the fact that seeds and growing are controlled by humans both humans and agents suggest
changes to improve the species.
Something of Semantic History and Use
As per Darwin Team chronics from prologue to epilogue (2000 - 2013)
by Juan Chamero as of 1st
of January 2014
Something of History
Semantic Search: How does a Darwin Semantic Search work? It proceeds rationally, as imagined before the
Web Search process were universally controlled by actual Search Engines like Google and Yahoo. Let’s go
back to the 90’s of the last century: The Web from its beginning in 1992 was growing so fast that all
imaginable efforts to let it grow under control failed meaning to let it grow structured under the point of
view of “Knowledge”. Let’s see a little what that means: books in a library are organized and “filed” by
“theme”. In order to implement a basic order for each “book” or similar piece of documentation, a “book
filing card” must be filled by librarians.
Something of history: In its beginning the Web creators intended to implement something similar enabling
to order the “Human Knowledge” by “theme”. However Web documents are not books but “pages”,
generally unstructured: written without adjusting them to some basic protocols and/or something like “Well
Written Document Formulae” with sizes that go from a few ordinary pages to thousands of them many
times bigger than many big books and as of today in a number that astonish us: more than 35,000 million of
Web Pages growing at a rate of 20% annually!
The actual Web is unstructured, non semantic: In brief the Web as of today looks as unstructured as in the
90’s and perhaps more taking into consideration the “dynamic pages” written by millions each second via
social networks. Web Pages are only indexed by “words” and all type of content is permitted, well or bad
written, open and freely, using any jargon, acronyms, multimedia objects, with or without expressed
purposes, nobody controls anything and that’s GOOD!. We are not criticizing it; on the contrary we are
admired! Actually Google robots are thematically blind, they cannot know neither the thematic of a given
document nor at least a meaningful dominant theme dealt with within.
The invisible order is up there: Notwithstanding an invisible, indelible, basic and dominant order is “up
there” in the “Web Space”. To “see it” you need a Wizard like the one we are introducing here and a built in
“Web Thesaurus”, a sort of structured index of the whole Human Knowledge hosted in the Web Space at a
given moment.
OK, the Web is unstructured, and so what? Sometimes better than a thousand words are visual schemes
and procedures. We have said that the actual Web is unstructured, and only indexed by words, and you may
ask yourself: and so what? Probably you spend many hours a day exploring the Web, most times successfully
and relatively fast thanks to actual Conventional Search Engines.
The misleading magic of opioid keywords: In order to succeed you have to be smart enough to query by
adequate “keywords”. Keywords are words or a precise sequence of words to query diligently the Web via
Conventional Search Engines. Web users have learnt to get what they need in a few clicks but many times
they feel disappointed because the search process took longer than expected and mislead them: for
example they were to buy a theater ticket and ended buying a T-shirt o playing a War Game.
Towards Semantic Search in Only One click: Darwin guide users to find what they are looking for in a
Unique Click without misleading them, guiding them to unveil what most times they have deep inside and
blurred in his/her mind, via a friendly and intelligent “man machine” dialog instead of “passing the ball to
you” and “washing its hands” telling something like: here you have millions of references - probably many
belonging to thousands of themes you are not interested at all - please choose one out of the suggested top.
If you are used to explore the Web guided by these “by words” search engines you will appreciate the
difference.
Wizard Demo
Knowledge Mapping: This demo operates as a semantic search engine for “The Art” in the world in English.
This is a small piece notwithstanding meaningful of the Human Knowledge Map that would map about 200
disciplines as of today, being a discipline a Major Subject of the Human Knowledge for instance Medicine,
Religion, Tourism, Security, Mathematics, The Art, Entertainments, etc.
The whole map will cover then 200 disciplines that semantically spans over the Web as a “forest” of
“Inverted Logical Trees” encompassing about 500,000 “Subjects” or “branches” of the Human Knowledge
dealing with an estimated “Knowledge Asset” of about 15 million “concepts” per language. The Art Map is
an ancient, robust and meaningful tree within this forest encompassing 7,570 branches and dealing with a
knowledge asset of about 400,000 concepts per language. Finally concepts correspond to “in mind” ideas
we humans have with idea as per Plato. See Semantic Web.
There exists an ideal query: Ideally, sorry for the obliged redundancy, users should search by either knowing
or guessing the “name” of the idea they are looking for Web references. In conventional search users try to
describe their in mind ideas via keywords. What it is generally ignored - even by Google experts- is the fact
that search engines like Google behaves like being virtually semantic when users either by knowledge,
intuition or guided by a Wizard like the one we are presenting here queries by the right name.
The mechanics of retrieving the right names: Our Darwin Wizard guide users to get those “magic names”
along a smart man-machine dialogue as being a “super librarian” of a Virtual World Library hosted in the
Web. In this way users may obtain what they are looking for in Only One Click. This magic is not trivial
because many in mind ideas have the same magic name, something like saying that within the knowledge
universe of 15 million concepts only 5 million are somehow different meaning that in the average any magic
name points to three different concepts. For example “walnut oil” within the context “oil” at its turn within
the context “painting media” make precise reference to a specific painting media defining the sequence
[painting media => oil => walnut oil] as a concept of painting within visual arts and at their turn within the
tree root “The Art”. In fact concepts are “semantic paths” of links built by specific words and/or keywords.
The beginning is the beginning: Users querying guided by Darwin Wizard explore the Web as if it were a
conventional search engine. By default they are invited to explore the Web via a Human Knowledge Map,
and concerning this demo via the Art Map. Along this mode users may point directly to any locus like in a
map - for example prompting the Art Map as a rectangular matrix where The Art themes are displayed as a
DNA sequence within a Genome from up-down and from left to right. Each point of this map could be
focused and magnified via a “mouse over” mechanism.
Return to “Art Map”
In the figure above a particular view of The Art Map skeleton is depicted from its ”root” in the upper left
corner with some subjects highlighted like Rigoletto, Light Lyric, Fantasy Novel, and Paella running up-down
and from left to right.
First demo search: A common mode of search is “by guessing” via a word or a keyword. The Darwin Wizard
tries first to match guessing versus the map skeleton. Let’s suppose that one user asks for “Greece”. The
Wizard will find the following 9 matches:
0.1.1.1.1.1.3.1., Greece: “History of Western painting”;
0.1.1.2.1.1.1., ancient Greece: “History of the Graffiti”;
0.1.1.4.10.1.1., ancient Greece: “History of Cosmetics”;
0.1.2.1.1., ancient Greece: “History of Performing Arts”;
0.1.2.2.1.1.1.5., ancient Greece: “Ancient History of Music”;
0.1.3.1.7., Greece: “History, Cultures”;
0.1.4.2.1.1.2.1., Greece: “Historia de las Artes marciales de Europa”;
0.1.5.13.2.5.1.3., Greece: “History of the “Pasta” of Greece”;
0.2.3.5.2., ancient Greece: “History of Writing Systems”;
that corresponds to 9 different concepts, that’s to say 9 different “in mind” images. In this example the Art
Map Skeleton will show 9 highlighted points.
Lyric soprano neighborhood
Neighborhood: As a second and complementary step Darwin Wizard invite users to focus more specifically
to their “in mind” ideas via not only the “main subjects” (7,570) of The Art Map skeleton but via the whole
conceptual (400,000) knowledge asset. Darwin invites to explore the target neighborhood, in the example
above pointing to “lyric soprano” implies defining the following sub tree
[Soprano => Lyric Soprano => light Lyric Soprano, Full Lyric Soprano]
Target Node Presentation: within sub tree pending of “female”: “Lyric Soprano” (yellow on grey
background) is head of the following track from “The Art” root:
0.1.2.2.2.2.14.2.2.2.3.3.
In the figure we have eliminated the seven nodes prefix. This node has as its ancestor the “Soprano” node,
…2.2.2.3 and as “sons”/ “daughters” the nodes “Light Lyric” y “Full Lyric”, …2.2.2.3.3.1 y …2.2.2.3.3.2
respectively, leaves of the Art Tree. Eventually the search could extend to the enlarged family including
“uncles”/“aunts”, “Mezzo Soprano” y “Contralto”, …2.2.2.2 y …2.2.2.1 respectively and to
“brothers”/“sisters” or “collaterals” “Soubrette” and “Coloratura Soprano”, …2.2.2.3.2 y …2.2.2.3.1
respectively.
The Wizard must proceed to show for each node, at users demand, their information content,
fundamentally the concepts closely related to the node thematic (yellow core in the figure) in the order:
target node, ancestor, and sons. Users are also challenged to mark those concepts that at their only
judgment be closely related to the “in mind” image they have in their brain however using their common
sense in order to not collapsing the query outcome.
Node content: In this demo we have only uploaded the Semantic Art Map Skeleton but not the content of
their 7,571 nodes. Only we have uploaded the content of a nodes sample sufficient to appreciate the search
mechanics in full. The full HKM, Human Knowledge Map and all the component of this “Knowledge Forest”
will have about 200 Human Knowledge Disciplines, 500,000 subject nodes filled (yellow cores in the figures
seen up to now) with 15 million of concepts per language and 500,000 “node metadata”, a sort of
“semantic fingerprints” of them (the nodes). In the figure below is depicted the basic structure of this
metadata: a) The definition of the “node name” as per one or more “Authorities” (URL’s); b) from one to
three semantic core clouds, namely: b1) A set of Generic Concepts closely related to the central node
theme, b2) A set of Objective Concepts such as names, locations, dates, codes, acronyms, etc., closely
related to the central node theme, b3) A set or gallery of images also closely related to the node theme and
considered relevant by a special family of Darwin agents.
Going a little deep
Simply by accessing the node the Wizard knows its semantic track, with information good enough to make
an efficient and meaningful query. However it has options to improve the efficiency and richness of its
queries. Let’s go back to the Lyric Soprano example:
The Art
Performing Arts
Main Performing Arts
Theater
Genres
Opera
Vocal Classification
Range
Female
Soprano
Lyric Soprano
that corresponds to the code 0.1.2.2.2.2.14.2.2.2.3.3. If the user wants to query the Darwin Wizard code
that track in the following string
*“the art” “performing arts” theater opera “lyric soprano”+
Querying Google Darwin Wizard checked the sequence from Buenos Aires as of 24th
November rendering
37,100 Web pages. Darwin computes for us the following series of uncertainty reduction
*“The art”, 103.000.000
+ “performing arts”. 11.500.000
+ theater, 8.190.000
+ opera, 5.140.000
+ “lyric soprano”, 37.100+
Before the user attempt to make his/her first click Darwin Wizard will invite him/her to adjust as much as
possible the semantic vicinity of “Lyric soprano” for instance:
*“the art” “performing arts” theater opera “lyric soprano” “Giselle allen”] => 5
*“the art” “performing arts” theater opera “lyric soprano” "light lyric soprano"] => 7.600
*“the art” “performing arts” theater opera “lyric soprano” "christina haldane"] => 1
*“the art” “performing arts” theater opera “lyric soprano” soubrette] => 15.100
*“the art” “performing arts” theater opera “lyric soprano” "non classical"] => 758
*“the art” “performing arts” theater opera “lyric soprano” "valerie masterson"] => 154
*“the art” “performing arts” theater opera “lyric soprano” "Sarah Brightman"] => 4.160
*“the art” “performing arts” theater opera “lyric soprano” "Joan Baez"] => 2.020
*“the art” “performing arts” theater opera “lyric soprano” "victoria de los angeles"] => 814
Appendixes
Nodes Physiology
Within a node Darwin hosts its Semantic Metadata, namely:
1. i-URL, intelligent URL, a sort of semantic brief of the node theme pointing to a small set of Authorities, at
least one, considered as statistically “modal” that “talk” with authoritativeness about the theme on hand. It
also hosts search parameters and markers.
2. Generic Concepts: Darwin extracts them from large “textons”, thematic vectors formed adding significant
samples of text corpuses of “all” Web documents that deals with a specific “theme”, for example “Lyric
Soprano”, normally in the order of a thousand to hundreds of thousands.
3. Object concepts: more than “in mind ideas” specific objects, for instance physical and juridical persons,
special events, places, etc, that contribute to identify “in mind” ideas properly. Optionally concepts of any
node could be “weighted” within their respective subjects as the Search Engines popularity for the query
*node path “concept”+. For example the “victoria de los ángeles” weight in the example above would be 814.
Art Tree metric (by level)
Level 1, 17 (17)
Level 2, 18 a 175 (158)
Level 3, 176 a 387 (212)
Level 4, 388 a 995 (608)
Level 5, 996 a 2107 (1112)
Level 6, 2108 a 4309 (2202)
Level 7, 4310 a 5.960 (1651)
Level 8, 5961 a 7023 (1063)
Level 9, 7024 a 7437 (414)
Level 10, 7438 a 7542 (105)
Level 11, 7543 a 7566 (24)
Level 12, 7567 a 7570 (4)
i-URL Layout: we show below a typical i_URL built for the Computing Map
=> Back
Logic of Web Search
Under Darwin Ontology
Semantic versus non semantic
By Juan Chamero, from Buenos Aires, as of March 13th
2014
Questions and Answers from Plato and Aristotle to the Digital Era
Meaning of querying
Querying by (q): A search process consists in issuing a series of queries q’s via “meaningful pairs”
[s, q] where (q), the only visible component of the pair, is a sort of guess, the best conversion to
words that users build to obtain information about certain “in mind” image they have related to
subject (s): the component that paradoxically remains invisible! The ideal search, in fact the ideal
meaningful pair *s, q+* or “semantic meaningful pair” is the one where (q) is the “modal name” of
the “in mind” image (s)!
Comment 1: in fact the q of [s, q]* becomes q*, the modal name of (s) and also to avoid confusion we may
add that the (s) of [s, q] becomes s*.
Comment 2: Take into account that in Darwin Ontology a “modal name” is the right name that univocally
determines a given “in mind” image or “concept”. In fact, statistically, the s* inference of (s) when querying
by q* matches the user “in mind” idea, being the reverse also true: if and only if querying by q* users may
infer s*.
Query outcome: Basically the search outcomes have the form of lists of References each one with
its “Title” or Header selected by the search engine under its particular criterion, the URL, Uniform
Resource Locator or address where the Web Page is located and a pse “search engine piece of
information” attached to it under the form of a paragraph extracted out of the content and
explicitly related to (q) trying to emulate as much as possible human highlighting: how much
retrieved content matches q.
So the outcome to the “visible” guess (q) is a list of (n) references ,R-, ,R1, R2, R3, .., Rn} where (n)
usually may go from a few thousands to hundredths of millions. These references have somehow
within their “text corpus” words and strings of words at any order, isolated or as t-tuples
conforming specific meanings, that match partially or totally the sequence of (q) words one or
many times: in fact just words and sequence of words here and there that somehow match (q)
components partially or totally. Some search engines like Google have the possibility of searching
by (q), exact as_it_is written, by embedding the string within quotation marks.
Are modal names easy to find? No, they are not. Let’s suppose that a same given subject (s) is in
the minds of a millon people that think, write and speak in the same native language. And let’s
imagine a sort of championship: all of them are invited to “find” their “in mind” idea with a given
search engine, for instance Google, with only one guess, no matter the amount and ordering of
words they use to materialize (q). And as we are talking of a championship some “entity” should
qualify and “rank” the search engine “outcomes”.
For this purpose we may create a special “semantic rank algorithm” that for example take into
account: a) the Top 50 References and matches their content versus a unique list of keywords
allegedly related and fully covering the tournament “in mind” idea; b) the size of the outcomes; c)
the thematic homogeneity of the outcomes; etc. At the end of the computation a winner (q) (or a
set of similar q’s from a semantic point of view) will define the “modal name”: this name becoming
q* will identify “statistically” and univocally the subject s*.
Semantic resonance: As most actual conventional search engines only index by words the search
efficiency depends too much of the luck, intuition and why not talent of users to point to the
invisible (s) by guessing the right (q). The experience tells us that actual search is a heavy
disappointing and uncertain task. We are talking about professional search, people that needs
precise information and knowledge about the most diverse subjects that frequently devote hours
and even weeks to find something valuable. Why? What happens falls under another
phenomenon: the Web “semantic resonance”.
We may imagine conventional search engines like radio devices that inspect the Web space thru a
dial. This dial is huge: it tunes along a spectrum of nearly 15 millions “stations” per language being
each station one “in mind” idea of humans. Tuning proceeds by (q) queries but the Web space is
terrifically sensitive from the point of view of semantics: it means that minor and apparently non
significant q alterations, in its components or within its grammatical structure: a singular instead
of a plural, a comma, a tense of a verb for one of the words of (q) may mean passing from million
of references to zero. Let’s see something about Pope Francis and Barack Obama:
[q  “A poor church for the poor”+: 321.000 references as per Google exact search;
“A poor church for the poors”: 1 reference perhaps because of a possible grammar error;
[q  “Hungry for change”+: 10,400,000 references as per Google exact search;
“Hungry for a change”: 723,000 references because saying points to a very different concept;
Comment 3: when we talk of 15 millions of concepts  “in mind” ideas in English we are by default covering
all “instances” of those concepts, namely examples, particular cases, specific occurrences, etc. By sure the
number of total “instances” will be by far larger than 15 millions!. A first step to make the Web semantics
will be to structure it by concepts as a first step and by all possible instances as a second step.
How does a conventional search proceeds? Given a pair [s, q] we obtain a series of links {R} where
from we inspect only a few, generally within the Top of the outcome iceberg, symbolized by {{R}},
for example the second, the fifth and the tenth: {{R}}  {R2, R5, R10}. And from it we may improve
our (q) => (q1) => (q2) => …and so on and so forth until we get what we need or at least to gather
enough pieces of information and knowledge to satisfy our needs. This process is not as
convergent as expected and to make things worse the (s) itself gets blurred, because too much
and sometimes misleading information stuns users. Only persistent and well trained users may
unveil modal names, the q*’s that may satisfy - in theory- their needs in only one click!
Conventional Search
Remember that (s) in the pair [s, q] is subjective and invisible. As the Web is only indexed by words
users must proceed issuing a series of (m) guesses as follows:
[s, q]  a (m) series of 5-tuples {[s(i), q(i)], n(i){Rn(i)}, f(i), puser(i)}
IA {puser(i)} SHOULD convey to satisfy users’ needs and SHOULD represent users’ “in mind” needs
 most times only partially and “blurred”
IA {q(i)} SHOULD Convey to Modal Name of subject (s)  almost never
Where:
o (m), stands for the number of guesses to find something valuable that matches properly
the user “in mind” image (s);
o (i), is the number of a specific guess or “Web exploration”: i = (1, 2, 3, …, m);
o n(i), number of documents ranging from a few hundreds to hundreds of millions that
match q in exploration i; {Rn(i)}: list of references from R(1) to R(n(i)); n(i){Rn(i)} is in fact a
single tuple, a “monad”: the outcome list (Rn(i)) of n(i) References;
o f, stands for “a few” references selected, generally from 1 to five of the Top 10;
o puser, stands for “user pieces of information and knowledge” extracted either by humans
or by agents as raw data to synthesize the whole exploration thru (m) steps in a consistent
Intelligence Report about the subject (s). Examples of pieces could be links, images,
multimedia objects, text corpuses, navigation instances, comments, etc.
o IA, stands for “Intelligent Analysis” gathering and classifying the most diverse pieces of
information and knowledge. IA {puser(i)}: Even in the case of semantic expertise the search
proceeds “spiraling and converging” within the Web from ignorance to truth approaching
to s* guided by a track of q’s approaching to q*: (q1, q2, q3, …, qm), along (m) steps,
probably not pointing to the best references that deal with subject s* but to its
neighborhood constituted by sub optimal documents.
o IA {q(i)}: This process of intelligence over q’s could be eventually performed by a Darwin
algorithm via a “literary and semantic synthesis” of tracking (q1, q2, q3, …, qm) => q*. IA
{q(i)} example: Along an exploration of (m) steps we used 10 words as follows:
q1: w1 w2 w8
q2: “w3 w4 w5” w6
q3: w1 w2 “w3 w7”
q4: “w9 w10”
………………..
qm: w2 w4 “w7 w5”
q* synthesis algorithm: A string like “w3 w4 w5” tells to a search engine like Google to look for
documents that have within their text corpus the exact expression w3 w4 w5 as_it_is. It is not
difficult to create an anthropic algorithm that suggest humans well written queries based on the
exploring track components and their related keywords weighted with their respective
“popularity”. The winner could be for instance: q*: “w1 w7 w5” that will point to s*.
Semantic Search
[s, q]*  A unique 5-tuple {[s*, q*], n* {Rn*}, f*, puser}
IA of puser: SHOULD convey to satisfy users’ “in mind” needs  always
q = q* is the Modal Name of subject (s); by default!
n* << most n(i) of conventional search
How to obtain q*: The first thing a Semantic Search Engine should do is to help users to identify
“the best” q*’s to fit their “in mind” images s*’s. In order to do that we need mapped, following a
hierarchic structure, all possible “in mind” images of humans, no matter how much they are,
assigning to each of them a “name” in each language being all of them “unique”. Let’s accept for a
while that these unique names exist and are perfectly identifiable and meaningful. We may ask
ourselves the following question: is it possible to guide users to identify their particular “in mind”
images? Yes, it is! The only we need as identifiers are “words”. The “retrieval algorithm” is rather
complex but absolutely feasible: giving a “guess” formed by one, two, three or more words
suspected as “members” of the name as matched in our brain (for example *w1+, or *w1, w2+, or
*w1, w2, w3+) the algorithm may “say us” something like:
“Dear user the pair *w3, w1+ of your guess is part of ,s1, s2, …., sh} - a list of (h) subjects (themes,
topics) probably pertaining to more than one Branch of Knowledge-. Please tell us what subject/s
you would like to inspect. And eventually you are invited to add one or more keywords in order to
make your query more specific. If you prefer you are also invited either to change your guess or to
navigate at “from A to Z” mode any Branch of Knowledge or part of it!”
Semantic 5-tuple: the s* of the pair [s*, q*] identified as above, provided it is possible to map the
whole Web, enable us to satisfy our information needs in only one click once q* is successfully
unveiled! In fact the retrieval is performed by a Darwin SSSE, Super Semantic Search Engine
working under Darwin Ontology that has a built in Human Knowledge Map (Web Map) and a
“Wizard” that acts as a “super librarian” (in essence an Intelligence Retrieval Algorithm managed
by an agent).
Queried by q* the search engine brings the monad n* {Rn*} with n* << n(i) (via conventional non
semantic search). The user selects a few f* out of the search engine outcome being f* of similar
magnitude to any of non conventional search f’s. Lastly user selects a unique set of puser, pieces of
information and knowledge, without the “noise” usually carried by the multiple puser(i) sets.
IA of puser: The Intelligence Analysis performed on puser proved to be coherent and straightforward
because absence of noise and because almost all pieces belong to the same thematic besides!
Some Reflections
The Thinker of Rodin
Reflection 1
How conventional search engines SHOULD perform open queries
Any search expert that makes extensive and intensive use of this type of search will face strong
doubts about the possibility of structural failures and bugs. Let’s imagine for instance how we
should program an algorithm to retrieve pages that match a given (q).
In theory search engines should perform heavy computations to match large q’s: let’s see for
example what happens for a q of 4 words: q: *w1 w2 w3 w4+; being “open” the search engine
robot must consider all types of non repeated valid sequences of those 4 components, for
instance *w3 w1+, *w2 w4 w3+, *w2+, and in general monads, dyads, triads,….., n-ads (n=4 for this
example).
We say “non repeated because sequences such as w2 w2 could be “prima facie” considered
nonsensical (or very specific…) meanwhile w’s appearances of components “here and there”
separated by spaces, paragraphs, etc.: (….w2………w2…………….w2…) found along a document tells
us that w2 matches three times increasing its “density appearance”.
The algorithm should take into account that users are trying to guess s* or q* by imaging via words
facets of them. The amount of k-ads permutations out of n elements is given by n!/(n-k)!, where
(!) stands for “factorial” instead of admiration sign. Applying to this example:
4-ads (tetrads): 4!/(4-4)! = 4!/0! = 24
3-ads (triads): 4!/(4-3)! = 4!/1! = 24
2-ads (dyads): 4!/(4-2)! = 4!/2! = 12
1-ads (monads): 4!/(4-1)! = 4!/3! = 4
n=4, k=1, 2, 3, 4
Being the permutations as follows:
k=4: (24) 1234 1243 1324 1342 1423 1432 2134 2143 2314 2341 2413 2431 3124 3142 3214 3241 3412 3421 4123 4132 4213 4231
4312 4321
k=3: (24) 123 124 132 134 142 143 213 214 231 234 241 243 312 314 321 324 314 341 412 413 421 423 431 432
k=2: (12) 12 13 14 21 23 24 31 32 34 41 42 43
k=1: (4) 1 2 3 4
2341  w2 w3 w4 w1
It means that in order to be fair matching and rank algorithms should take into consideration all
possible permutations of all n-ads and of their permutations and compute as many times a given
match occurs.
Reflection 2
Some actual SE’s detected problems
Nobody knows exactly how actual search engines retrieve and “rank” Web pages. We have
detected many failures such as not respecting the formal logic, for example the AND/ OR
operations. When a user ask for w1 AND w2 it is supposed that asks for those pages that have
within their text corpus both words, no matter their order and separation nor the “appearance
density” of them. However “many times” some search engines turn users crazy computing explicit
or implicit AND operators like pseudo OR’s instead!
AND: For example querying by Google (as of September 4th
2013, from Buenos Aires) q: “hungry
for a change” (exactly, embedded within “quotation marks”) gives 723,000 references and the
query: [q  “hungry for a change” Obama+ instead of obtaining a significant reduction of
references because we are disregarding those pages that having “hungry for a change” do not
have Obama within their text corpuses, Google bring us 5,860,000 references?!
OR: Let’s try now the open queries: *q  barack obama] that SHOULD give the same outcome that
the reverse [q  obama barack]; However the numbers are different (as of September 4th
2013,
from Buenos Aires):
[q  barack obama], 860,000,000
[q  obama barack], 1,040,000,000
Differences around the clock: most search engines show significant outcomes differences along a
day and for different regions of the world and for some search operators like for instance Google
searching within specific domains via its “site:” operator.
Comment 4: These are bugs possible related to an erroneous importance and neighborhood influence that
Google rank algorithm assign to high popularity keywords, like Obama in the example.
Reflection 3
What’s in a query?
The answer (a) of a query (q) implies effort, waste of energy. A “person” (oracle, teacher,
assistant, helper, shaman, authority,…) provides another waste with information and knowledge
under the form of explanations, documents, sources of information and knowledge that need of
analysis and synthesis. As we have seen above (s) is behind any (q) being q the only visible part.
What is then the nature of any answer (a)? It was described as a 3-tuple:
[n{l(i)}, f(i), puser(i)]
Meaning volume, sample, and assimilation of the answer. In fact search engines provide users a
torrent of unstructured and disordered information; users usually pick a tiny sample of this
torrent, and then dig in into the pieces of the sample in a sort of individual ad-hoc “data mining”.
What is also “hidden” is the net worth of the query, namely the quantum of information and/or
knowledge and/or intelligence finally assimilated by the user. As the query was triggered by (s) we
may assume that this final net worth could be described in terms of (s) as well. So the full
“learning cycle” will be something of the form:
[s, q] => [d, d, s] => [s’]
Where:
s: the hidden s before the query;
q: the actual query;
d: a generalized torrent of “documents” d;
d: a generalized sample of the torrent of documents d;
s: a generalized ad-hoc “data mining” over raw data extracted from sample;
s’: the hidden s’ after the query defined as a sort of convolution (s+s);
Some sources and semantic axes of this Darwin exploration:
Active learning;
Active Learning by Querying;
Machine learning;
e-learning;
Junior search engine;
University of Twente, Netherland;
Picture based querying;
Knowledge and Information;
Tools and Techniques for Gathering Marketing Data;
Examples of Explanations for kids;
Another example of Explanations for kids;
The Evolution of Mind Mapping;
El Súper Libro de Preguntas y Respuestas de Charlie Brown;
FUD, Miedo, Incertidumbre y Duda;
Munrudico Visual Images, from La Vanguardia;
[q  "querying and learning"], 3,250,000;
[q  “query to know”+, 191,000;
[q  "learning by querying"], 26,800;
[q  "query for learning"], 32,900;
The Darwin 5-tuple for Kids
The query 2-tuple is the milestone of searching; users ask for a pair *s, q+ where s is the “in mind”
image they have about what they “need” in terms of information to improve their knowledge. As
the figure down at left suggests they are invisible meanwhile q the second elements of the pair
point to the hidden idea expressed the best way by users, see the figure right a common way: “by
explanations”. There are many ways to express them and figures down show some: a) primitive
(the little child gesture suggests more than words. doesn’t he?; b) a more elaborated q via words
(a girl telling something to a friend; c) an “average” q expressed via formal audiovisual messages;
and d) an audiovisual explanation of something relatively abstract and complex as the Yin-Yang.
i) The query 2-tuple: [s, q]
Examples of q’s
Explanations for kids
Munrudico Visual Images Explanations for kids
Explanations for kids
Primitive: blablabla Average Complex - Abstract
The query 3-uple is the outcome of searching: within the triplet [d, d, s] the first term d stands
for SERP, Search Engines Results Page, and amount of references (URL’s) that point to documents
dealing with query q. As most conventional search engines are not thematic and only index by
words these amounts are generally too big and thematically ambiguous. Its second term d stands
for a sample of supposedly thematically specific and authoritative documents. Here rests the
main weakness of conventional search engines. Finally its third term s stands for an “intelligent
and intelligence data gathering” procedure that generally has to be performed by the user: in fact
an intelligent collection of pieces of text, multimedia objects, images, metadata, pointers, etc., to
work on it as intellectual raw data.
ii) The query 3-tuple: [d, d, s]
SERP - Authorities - Info + Intelligence gathering
The comics down depicts the universal Q&A learning cycle of learning by questioning. Along
successive 5-tuples humans make their “in mind” ideas (s) evolve to become (s’).
iii) The Q&A learning cycle
[s, q] => [d, d, s] => [s’]
s Q&A s’
=> Back
The Art Tree Darwin Demo
Presentation to the XXXXX Institute
By Juan Chamero, from Barcelona, as of January 2015
Street Art Utopia, by David Walker -Juan Tuazon
The Art Map, as a piece of the HKM, Human Knowledge Map, was built for the EU, European
Union as a demo of the “semantic” talent of Darwin Methodology to retrieve out of the
Web_as-is all the information and intelligence disperse here and there from the past to present.
The knowledge creatures you are going to see, semantically structured by Darwin, were “up
there” in the Web Ocean uploaded openly and at will by millions of artists and laymen. That
map was uploaded to the presentation laptop acting as virtual “semantic glasses” of the Google
browser.
1. A three upper levels vision
Take a look to its seven main clusters. Within each cluster the main art subjects are depicted and
hyperlinked like in a geo map. The Art Tree is deployed from root to leaves in up to 13 levels.
2. Mouse over the node “drawing”
3. A detail of the upper level “drawing” sub tree
This is a sub tree of 80 derived nodes. For each node we may have access to a gallery of images (5
in this demo)
4. Knowing a little more via Google images thru Darwin Semantic Glasses
5. Knowing a little more via Google images thru Darwin Semantic Glasses
6. We are invited to search “pen and ink” works within “artist tools”
7. Knowing a little more querying by the concept “masters of drawing” via Google Web
8. Deepening a step querying by the concept “Leonardo da Vinci” via Google Web
9. Similar as above querying by Michelangelo
10. Let’s go now to the “drawing” neighborhood
11. Let’s see what a neighborhood is
By “semantic neighborhood” we mean the pertinence or membership to a “semantic family”, in
this example to “the drawing family”. So drawing is the second son of “classic arts” and brother of
“painting”. It has many “brothers” as sculpture, and architecture and several “sons” or
subordinated subjects such as “history”, “artist tools”, “support media”,…. Trees and sub trees
may also offer access to arbitrarily agreed forms of extended families including collateral subjects
at the level of “uncles” and “aunts” and even closely related and/or friendly and/or akin subjects.
Next we are going to explore how data is structured in a sort of database. One of the problems we
face when dealing with “Big Data” applications (and this is one!) is how to offer friendly and
efficient interfaces to navigate and at the same time provide overall visions and up to the
minimum detail as in geo maps. As we will see a HKM, Human Knowledge Map in a given
language, must map more than 15 millions “ideas” along more than 600,000 subjects (themes or
topics) finally structured as a “knowledge forest” of about 200 disciplines. Take into account that
The Art, only a small piece of it notwithstanding “complete”, has 7,571 subjects and about 500,000
“ideas”.
12. Let’s see a little deep inside
The Art Map content could be saved and deployed resembling a DNA vector along a two
dimensional matrix, in this case of 23 columns by 329 rows. Concerning the whole HKM, Human
Knowledge Map it could be saved and deployed in 23 columns by approximately 30,000 rows
(~690,000 subjects, not too much in terms of “Big Data”! Within each “semantic cell” that have
specific and unique name is hosted the “semantic fingerprint” of the “subject” pointed by the
name, a brief description of it and the “authoritative sources” where from the Darwin agents
retrieved the description.
13. Passing the mouse over “painting”
The deep level of browsing: Imagine yourself browsing The Art tree by track from root to leaves
and going from right to left and in parallel creating the 7,571 cells from upper left corner of the
matrix, going right and down row by row unfolding the tree in a rectangular matrix. We are going
now to browse the map cell by cell and even within each cell reaching a semantic universe of
~500,000concepts!
14. Let’s inspect the “interior” of a given cell, for instance “lyric soprano”. The search engine tell
us that there is a node named “lyric soprano”
15. Let’s go to “lyric soprano” cell and its neighborhood
16. Mouse over Lyric Soprano again ….
Doing mouse over the Lyric Soprano cell activates the same search features as in slide 2 and
subsequent: sub tree of the node and its neighborhood complemented by a gallery of images.
17. Node content: i-URL’s and Semantic Fingerprints
We are entering into a new dimension of searching: This “feature” is not only a powerful tool to
make the search more direct and precise but a tool to find whatever we need in only one click as
well. It is equivalent to being in a huge Web Library managed by expert and friendly librarians.
We have said that in each node is stored something like the demo of its subject. Let’s suppose that
for the subject “masters of drawing” there exist 20,000 Web pages dealing with this subject with a
high level of authoritativeness. Darwin Agents under our Darwin Methodology and guided by our
Darwin Ontology may unveil from these raw clusters of content a weighted set of dominant
concepts (modal concepts) that are considered the semantic synthesis of the cluster subject:
masters of drawing in this case. This set is the core of the above mentioned “semantic fingerprint”.
You may easily guess that adding one of these specific and unique concepts to your “querying” it
will focus precisely on the semantic key you are looking for! We are close to the “find a needle in a
haystack” utopia.
You will be now invited to see how this feature works. Darwin agents will also generate for each
subject its corresponding description expressed as part of its metadata (i-URL). Concepts could be
of several types: generic, objective, functional, etc.
17 bis. List of concepts stored in the “Lyric Soprano” node.
You are invited to make click, perhaps your first click along this demo: making mouse over will
provide you only semantic overview. In order to be specific and going right to the point you must
make a click: in the average no more than one!
18. Examples of specific search in only one click
=> Back
DARWIN (BRAIN) TEASER
THE WEB (micro brain teaser)
o The Web needs
o Darwin does
o A myriad of applications
Semantic Web: it does not exist yet!, we are on the way to….., the
Semantic Web is coming….;
The Web as a two interacting worlds’ paradigm: K versus K’:
established order versus fuzzy, unstable and in evolution orders;
governments versus governed; who teach versus who learn; sellers
versus buyers;
Just words: As of today Web documents are only indexed by words not
by meanings;
Social Explosion: From a world of 50 million Websites versus 200
million people, to a world of 700 million Websites versus 2,900 million
people in only two decades;
The Web of today looks like a huge ocean of non structured creatures
(Web pages) where only a few are structured ones: in numbers 2%
versus 98%;
The beginning should be the beginning: to properly know the Web we
need to know K first in as much detail as books and essays in a
conventional library;
K Role: The world power and resources management are in K;
K’ Role: Innovation and changes (sustainable) are in K’;
Big Data: mostly non structured and public lives in K’;
The Human Knowledge as of today has four stages: information,
knowledge, intelligence, and wisdom; we are entering into the
Knowledge Era;
THE WEB NEEDS
…to be structured, beginning by K. One way is to structure it “of a sudden” or gradually. Another way would
be to “see it” as structured, like Galileo Galilei did with its telescope. Darwin which stands for “Distributed
Agents to Retrieve the Web Intelligence” may build a sort of “semantic glasses” to see the Web more and
better (as virtually structured). These glasses have two parts: a Map of the
Human Knowledge as retrieved from the Web as_is at a given moment and
a “Wizard” that dialog with users like a “super librarian”. These maps that
may evolve by themselves have all conceivable themes we humans share at
a given stage of our civilization. If we imagine the Human Genome like a
knowledge database that tell to intelligent beings how we are the Human
Knowledge Map tell them how we think via how we document. Back
DARWIN DOES
o Darwin search guides any user to what he/she needs in Only One Click;
o Darwin Wizard dialog concisely with users assisting them to engineer the best question that point
directly to what they need in 30 seconds in the average;
o Darwin procedures, algorithms and agents are not intrusive. They “behave” only as smart
observers. In some extent could be considered angels;
o Darwin a predicting tool: Darwin may gather all pieces of information and knowledge dispersed on
the Web about any subject and suggests humans how could those pieces be structured;
o Darwin unveiling tool: Darwin may inspect any cluster of data suggesting humans the best
statistical “metadata” about it, in fact enabling us to “see it” more and better as semantically
structured;
o Darwin procedures, algorithms and agents may retrieve from the Web as_is information and
intelligence disperse at both sides K and K’ building trustable and interdependent and synchronized
K Thesaurus and K’ Thesaurus respectively;
o Persons either as Juridical Entities or People as proprietors or administrators in K or as single users
in K’ leave indelible tracks of their behavior at both sides. Being both sides semantically structured
it is possible to make meaningful and trustable causal inferences about their behavior and all type
of activities trends; Back
A MYRIAD OF APPLICATIONS
o Encyclopedias,
o Meaningful translations,
o Trustable surveys and polls,
o IR, Intelligent Reports,
o SSSE, OOC, Only One Click, Super Semantic Search Engines,
o Knowledge Maps,
o Semantic e-Learning,
o Knowledge Creation,
o Better Truths,
o Free flow of Information and Intelligence between K and K’,
o Seeing more and better hidden demands,
o Full equalization of Offer versus Demand scenarios and vice versa,
o People Behavior Trends,
o e-membranes building as universal interfaces among multiple sources of knowledge,
o And much more ………Back
SEMANTIC WEB
The Semantic Web, for its creator and actual W3C Consortium Director Tim Berners Lee does not exist yet, it
is another Web. Web Semantics is a discipline in state of formation dealing with the meaning of things and
manifestations of everything that surrounds us, from concrete things and matters to any degree of
subtleness encompassing from sensorial to non sensorial world.
Following the track of managing and understanding textual and multimedia corpuses we are learning to look
for information and knowledge thru images and shortly we are going to enter into the world of tactile,
olfactive, tasting, sounds and sixth sense semantics.
The figure above shows the cover of Semantic Web, an eBook of Juan Chamero, Principal Architect of the
Darwin Methodology . Back.
.
THE WEB AS A DUAL WORLD PARADIGM
Internet is a technology and a network that enables human communication in both ways and
simultaneously, for example between Websites proprietors and administrators with their users and
reciprocally users with them.
AS OF TODAY: in the figure is depicted a Dual Web Model K versus K’ as per Darwin Ontology. The black
region above corresponds to the Established Order of the actual world as_is in the Web or K Region, mostly
not structured conceptually and for it considered as semantically “flat”. The green part corresponds with the
K’ Region of users also non structured and relatively chaotic as they (users) are connected as individuals
generally without pertinence to any ordered group or pattern.
Information and a basic intelligence flow from K to K’ (dense blue arrow) thru an e-membrane
which could be or not intelligent (in yellow) enabling a predetermined basic information transfer from K’ to
K but not knowledge neither intelligence (light yellow arrow).
FUTURE: Down is shown the Web as within a couple of years. K side absolutely structured for example thru
a Human Knowledge Map that will enable Semantic Search of information and knowledge in only one click.
Side K’ also could be structured via its respective K’ Thesaurus or Web Users Thesaurus, a necessary
condition to make meaningful and trustable People’s Behavior Patterns inferences. Being both sides
structured it would be then possible the open and free interchange of information and knowledge between
both regions thru their respective e-membranes (dense yellow arrows both senses and dense blue arrow
from K’ towards K). Back.
JUST WORDS
Conventional search engines only index by words. Ideally a textual content is seen by their robots as
elementary semantic objects, located between “blank spaces” and some other punctuation marks or
separators such as (,), (;) and (:), recognized as “words” for any language and well or wrongly written.
The interpretation of words or chains of words as “concepts”, “subjects”, “themes” or “topics” is performed
by users at their only criterion. In fact users perform their searches via “keywords”, words or chains of words
that either at mode exact as it is written or disperse within the text corpuses here and there they guess
point to their “in mind” idea.
We, humans, document by threading concepts that correspond to our “in mind” ideas but expressed by
words in our jargon, state of mind and humor and depending of our culture and level of formation. For this
reason our keywords could belong to many cognitive worlds and families within them: take for instance
“José Pérez” pointing to a multitude of JP’s, namely a road maintenance worker of Guatemala, a New York
company clerk or a NASA nanotechnology researcher. As we, humans, tend to organize our knowledge “tree
wise” spreading semantically and hierarchically knowledge subjects along inverse logical trees from roots to
leaves keywords creatures like JP may coexist in dozens of arboreal disciplines and within them in thousands
of different subjects. Google for the exact term JP renders 2,340,000 references but deliver them as a flat
structure, all could be the corpus we are looking for. On the contrary, in a Semantically Structured Web it is
possible to discriminate thousands of similar JP’s by different semantic context , let’s say something like
from 00000001JP to con00345557JP. Back.
SOCIAL EXPLOSION
The Web is expanding at a very fast pace, in terms of daily interacting people from K’ as well as per the
volume of their transactions and the reach and deepness of them in terms of logical layers added within a
man - machine model. At this respect many users interact more and better than prestigious Websites.
However this explosion is still in a very primitive stage from a semantic point of view: users are learning to
query efficiently and to express themselves meaningfully and an informal Q&A system of learning is on the
go. Each day more people learn this way perhaps in excess. As a con users’ language is more ambiguous and
limited but as compensation they learn and communicate via images and make use and understand via their
senses.
On the contrary, the K world looks like frozen, watching and extremely aware of what’s happening in K’
Region. In K vision the K’ World is running out of control and looks irrational and rather chaotic. For some
thinkers the Web is entering into a sort of new medieval age. Meanwhile K’ side learns to order by itself K
side tries to detect in K’ seeds of a consensuated new world order. Back.
THE WEB OCEAN
The Web Ocean: The Web behaves like a huge Ocean where creatures of the most diverse species live. K
World would be represented by the Ocean itself formed along eons with creatures whose life cycles and
forms strongly depend of the Ocean deepness. The K’ World would be represented by we, humans, that go
to Ocean to nurture ourselves, to make provisions of renewable and non renewable resources and for
transportation.
The Ocean creatures need organic carbon to survive and they obtain it from the zooplancton which at its
turn survive from fitoplancton. The Web Ocean to “survive” needs of information as a primary fluid defined
by Claude Shannon in its Information Theory and that also needs, as the Ocean, a source of energy, at large
the one provided by the Sun. Back.
THE BEGINNING SHOULD BE THE BEGINNING
In Spanish we used to say “a truth of Perogrullo” by something so trivial that’s stupid to say it. However the
situation seems an aporia, a state of puzzlement, confusion concerning a million dollar question: why is the
Web still unstructured?. From its very beginning Tim Berners Lee its creator, presented and imagined it as
semantically structured and today after more than 20 years of life it remains unstructured. Our explanation
follows: It was born like a K Region tool to be used by quite a few, at large “authorities”. An initial primitive
bibliotechnology bureaucracy was weakening along the time for many reasons, namely: a) explosive growth
of Websites and Web Portals; b) a speculative human nature prone to disguise users and competitors in
poor control contexts; c) excessive complexity of Web semantic protocols and tools. See Internet History (…,
2008) and Darwin Methodology.
So what would be easy to reorient from its beginning (till 2008) became something near to impossible:
Websites are now too much belonging to all type of domains, languages, countries, and cultural habits. In
order to correct this failure we envisage two main strategies: i) to start from zero ground only structuring
new Websites with or without programmed conversion of the existent content or; ii) to build a sort of
“semantic glasses” such as Darwin Semantic Glasses to “see” the Web as virtually and perfectly structured.
Back.
K ROLE
Web authorities are Websites and Portals that because their “popularity”, traffic, prestige, singular nature
of the information and knowledge they provide, number of links entering and going outside them or by an
algorithmic combination of all these factors rank high. See “Page Rank” Google algorithm.
These characteristics correspond with an actual Established Order Model. In K Region is then represented
the actual World Power: governments and their agencies, supranational entities, universities, professional
colleges, Intermediate Associations and Organizations, ONG’s, etc.
The information and knowledge of this region is what it is but near to become a fossil at any moment. In K is
what it is and should be, the World Global Offer about anything. By intrinsic nature K and their entities are
too inertial; they may change but slowly, gradually. Back.
K’ ROLE
BIG DATA
Big Data is a term in process of formation that makes reference to the creation, detection and
administration of big masses of data difficult to handle for the actual state of the art of computing and
databases administration. There exist structured Big Data such as the one generated by the Elementary
Particles Accelerator of CERN, Switzerland and non structured Big Data like the ones generated within social
networks.
Associated terms are: Cloud Computing, Grid Computing, Smart Computing, Chaos Theory, Big Science,
Social Data Revolution, Inferential Statistics, Inductive Statistics, etc. And some tools: Watson Super
Computing, NoSQL databases, Apache HADOOP, MongoDB, etc. Back.
HUMAN KNOWLEDGE
Within our Digital to Mind Paradigm the Human Knowledge has four progressive steps, namely: Information,
Knowledge, Intelligence and Wisdom. Information as a digitalized fluid was created by Claude Shannon
under its Information Theory in the early forties of the last century, at the Second World Word aftermath
and very few has been advanced along this line of basic scientific discoveries since then. For instance a
theory explaining what the knowledge is: a fluid more subtle and at the same time more elaborated that
information perhaps?. However we intuit what knowledge is and even dare to classify it.
In despite of ignoring what intelligence is we, as humans, started performing some interesting intelligence
classification glimpses: understanding what intuition is; what emotional intelligence is; to differentiate how
our left and right sides of our brains work and to define intelligence as the art of taking wise decisions when
facing complex crossroads.
About wisdom we know nothing except to recognize it as one of the essential virtues and associate it with
rationality, equanimity and emotional maturity. As a fact we are leaving the Information Era entering into
the Knowledge Era. Back.
THE DARWIN HYPERCUBE
Conventional search engines like Google index by words and are “semantically flat”, they are unable to
recognize concepts and not even the thematic of documents inspected. Darwin, on the contrary, may index
the whole Web detecting and recognizing concepts and the thematic of any document inspected. The Web
Thesaurus built by Darwin could be imagined like a huge hypercube of as many “floors” as “disciplines”
(about 200).
Darwin agents run thru “clusters” or “textons” (about 100,000 documents each) thematically homogeneous
retrieving their “fingerprints” building by de factum their “metadata”, a necessary condition to see them as
semantically structured. These fingerprints are like cognitive synthesis of the inspected document, a sort of
vector of weighted concepts, and resemble the book filing cards of conventional libraries. Concepts are
hosted in the nodes of the Darwin Logical Trees.
Darwin Agent exploratory task
We embedded down a Flash demo that explains how a Darwin agent inspects clusters. It was settled for 5
speeds, from 1 to 5. It would appear initially deactivated. To activate it make click with the right button of
your mouse and a menu will display: push button “play” or equivalent.
As inspection proceeds running thru Websites Authorities of cluster 11 (once finished cluster 10) an
associated script elaborate summaries and statistics. Once inspection of cluster 11 finishes Darwin agent will
go to inspect the next cluster 12. All k’s and their derivates k_.... are suspected concepts that Darwin agent
detects/unveils within the Web pages’ corpuses.
Darwin agent detects and “measures” the Websites authoritativeness within each cluster: each Website
inspected could be (or not) Authority and agent jumps from link to link via a sort of markovian algorithm.
Agent may return many times to a given Authority as a function of its architecture, the relative weight and
importance for the theme inspected.
Activate by pushing “play” making mouse over the image. Once the exploration starts it could be repeated
pushing back the “volver a reproducir” key. Thanks!.
=> Back
Present and future of Web Searching
Are conventional search engines like Google entering into a New Age of Search or on the
contrary into a degrading spiral of misleading information and knowledge?
By Juan Chamero, Darwin Methodology Architect as of January 1
st
2014
A FUD Vision, by Grist.org
Darwin Ontology: It is a Classical Ontology that models how we, humans, document our ideas
probabilistically. It differs from Computational Ontologies that model how humans document
ideas under strict computational protocols resembling formal logic “forms” specially suited to be
used by trained people and also by agents. Darwin Ontology deals with any type of documents
written by humans in any language and for any thematic from bad and fuzzy written ones to
essays and thesis written by academic and experts under strict protocols as well.
The Human Mind: Darwin assumes that we humans are by far more complex than any conceivable
agent and that our logic could not be constricted to a reductionist game of formal logic. Our brains
have the ability of synthesize in an instant thousand of YES and NOT tonalities about anything
instead of only two and we also have the talent to transfer these abilities to agents and
algorithms. Computing Ontologies are necessary and extremely useful as complementary of
human ontologies and essential for “coding” semantic data and structures to make them
computable.
The actual Web as_is: It is estimated that at least 99% of the Web content is not well suited to be
inspected by Computational Ontologies and it is highly probable that this situation will not change
substantially in the near future. So to “see” more and better the Web as_is the only way is to
structure it semantically or at least to “see” it as virtually structured via Darwin.
“The Web Semantics paraphernalia”: what most Internet experts mean by “semantic” and
specifically by “Web Semantics” will astonish intellectuals, academics and professionals not closely
related to the Internet technologies. Probably many of them deepening a little on what is
considered semantics, semantic search, and Web semantics, would get surprised seeing edge
technologies within a sort of medieval scenario plenty of “philosopher’s stones”, pseudo scientific
assertions, taboos and algorithm mysteries all mixed-up.
“The Darwin journey”: Personally as Darwin Ontology creator I started 12 years ago studying what
a concept is from Plato and Aristotle to Spinoza and to our Digital Era, its differences whether
existent with: ideas, how humans write and document them, and how along centuries have learnt
to structure them hierarchically as structured knowledge. Complementary I went deep about the
above mentioned paraphernalia: the meaning of words, common words, expressions, quotations,
sayings and concerning Internet content differences whether existent among: themes, subjects,
topics, thesauruses, dictionaries, glossaries, jargons, keywords, metadata, tags, etc. From this long
journey I realized that the Web is a well structured semantic universe almost ignored up to now
because its real semantic structure remains subtlety hidden!
FUD: Are we entering into a scenery of FUD, Fear, Uncertainty and Doubt? Or perhaps into a sort
of nonsensical technocratic discourse where dubious acronyms and neologisms usurp the place of
universal and eternal concepts? It seems that any Internet innovation, no matter if minor or
significant, should be accompanied of new features impossible to describe with conventional
concepts. For instance we all were induced to believe from its very beginning in 1991 that the Web
was intrinsically semantic.
W3C: The Web as of today is unstructured. Tim Berners Lee, the Web creator, was and he is still
convinced that the Web is “potentially” semantic and that at large it will tend to be. But the real
truth is that it is still “flat”, unstructured from a semantic point of view. The search engines
accepted from the very beginning this reality as something immovable.
Tim Berners Lee was also the founder and until now Director of the W3C Consortium, the Web
leading nonprofit institution to design and develop standards, languages and tools to manage and
improve the Web. In some extent they accepted the Web as_is as a rather difficult data reservoir
to make it fully semantic because - for them- its conversion to semantics involves the heavy task of
rewriting the whole content following strict protocols by language and/or building for each
document its corresponding “metadata”.
The Web Community: In parallel to this rather hidden and not yet declared criteria the leading
actors of the IT&C industry agreed that semantic means “meaning” and as they worked from the
very beginning indexing the Web by words they also accepted, by extension from 2001 onwards,
that any methodology, script, tool, program, search engine, algorithm, relating words and chains
of words between them means “semantics”. They by de facto ignore how the Human Knowledge
structures by itself independently of the Web existence, since the beginnings of our civilization.
The semantic nature of certain advanced Web applications, like for instance relations between
keywords clusters and groups of people were “tagged” as semantic.
Google pragmatism: Along this pragmatic track Google added features at a fast pace defining
what semantics is and stating that they provide semantic search and that adding some advanced
apps like “The Google Knowledge Graph” they become by de facto a SSSE, Super Semantic Search
Engine and perhaps now adding interrelations between users and some “semantic mass media
networks” of the Web a new level of service could be attained as for example a SSSSE, Sensorial
Super Semantic Search Engine, and so on and so forth.
The New Google: Google under “Hummingbird” and “Knowledge Graph” undertakings will
operate with enhanced glamour but semantically at the same level as before. As an analogy it’s
like a sort of Human Needs Care World System to attend Human Needs via an OTC, Over The
Counter - Q&A dialogue system. Its Knowledge Database is a huge store of “pointers” (more than
35,000,000,000) to isolated pieces of information and knowledge, and within the analogy to
prescriptions, treatments, diagnosis, cases, “medicines”, stored here and there but it still ignores
everything about health care itself. Notwithstanding it has a valuable asset: it knows and it will
know better and in extreme detail the demand in terms of users’ needs. For any OTC user demand
Google may provide isolated pieces of information that at random and only eventually may have
sense for the user: too much effort to provide at large FUD instead certainty.
What’s missing: At large Google will know a little more than before its semantic limitations, in this
analogy it needs to know the “other half” of the Human Needs Care system: The Human Needs
Offer, semantically in as much detail as to fully cover any conceivable demand. This job has to be
performed from “zero ground” by making the whole Web evolve to semantics. The Human Needs
Offer has a name: The Human Knowledge Map.
Are conventional search engines entering into a New Age of Search or on the contrary into a
degrading spiral of misleading information and knowledge? That’s one of the big questions of
this Digital Era perhaps rooted deep within the following conundrum: Digital to Mind or Mind to
Digital? By Mind to Digital we usually mean going to Digital under our control as a way to improve
our mind and our lives meanwhile by Digital to Mind we usually mean going towards a superior
quasi robotic mind following technology innovations.
Digital to Mind approaches are characterized by a) contempt, ignorance and disregard of the past
and of the validity, weight and reason of the human evolution; b) high and talented creativity; c)
sometimes dangerous and blind forms of reductionism.
The Web content: The Web content prima facie looks like semantically chaotic: 35,000 million
documents dealing with more than 500,000 themes and about 15 million of concepts per
language. Its logic is highly fuzzy almost impossible to unveil by robots unless processed
probabilistically as Darwin does.
The Web as a dual K versus K’ model: Let’s imagine the Web space as dual, Websites by one side
and users at the other side continuously interacting between them as an Oriental Yin-Yang. In the
Darwin Ontology are known as the K - side of the “Established Order side” and the K’- side or “The
People side”. You as a human could also behave dual as a user in K’ side and as a ”owner” if you
are administrator or owner of a Website as an Established Order entity of K side, however never at
the same time and with different behavior.
K versus K’ in numbers: When you use a search engine to query the Web as a user you are looking
for something you need. These needs are expressed by messages as of today mostly expressed via
“words” but take into consideration that are “K’ side messages”!, messages of people demanding
some type of help. Let’s approach to the problem in numbers (Web facts):
o In terms of traffic: the K side as of today is a huge ocean of nearly 35,000 millions of
documents hosted in 350 millions domains and expressed by 650 millions Websites. The K’
side has 2,400 million users that query the K side 1,500 billion times a year (2012).
o In terms of “power”: 100 million privileged people at K side versus 2,400 million users at K’
side interacting thru 1,500 billion queries a year.
HKM, Human Knowledge Map Feasibility: Darwin satisfies the five IF’s, namely:
o IF: In K side there exists everything we humans need in terms of information and
knowledge;
o IF: This asset is structured and classified by all possible “themes” of the Human Knowledge
(not the Human Needs!);
o IF: In K’ side there exists “virtually” (for example as a map somehow stored and made
available in the Web space) and it is also structured and classified by all possible “subjects”
of the Human Needs;
o IF: it is supposed that Human Knowledge accommodates somehow specifically and/or
univocally to Human Needs;
o IF: we provide users an aid to express their needs properly and accordingly we also
provide them all pieces of information and knowledge in order to satisfy their needs;
We may say that we have solved the generalized Q&A, Questions and Answers problem:
Satisfying Human Needs in Only One Click.
=> Back
DM - Mega algorithm
A Darwin Classified
By Juan Chamero, as of Feb 2015
Purpose: To map a single Major Discipline of the HK, Human Knowledge
Objects
CeptsDB: cepts Database; it contains all suspected Darwin cepts (plus the suspected minority of
main subjects, all mixed up); It contains semantic noise, redundancies, and many types of
ambiguity; its volume strongly depends of the discipline to unveil, for example about 900,000
suspected cepts for “The Art”; At the end of the process half of this volume will be eliminated
tagged as “probably wrong”; records will have the form: [c, u] where c is the suspected “cept
name” and “u” the URL of the under inspection Web page where the suspected cept is present;
the ds, “discipline sample”, is the discipline amount of documents (Web pages) that deal with the
discipline to unveil as per the Web as_is as of today!
Example: “The Art”: exact search in G: 125,000,000 documents (as of 23rd
Jan 2014) then
ds: 125,000,000 Web pages
Note 01: Humans using DM under Darwin Ontology “know” how to extract suspected cepts from ds
documents. At the example about 900,000 suspected cepts will be extracted from 125,000,000 Web pages.
Note 02: we may imagine the semantic subspace (c, u) like a huge Boolean Matrix of 900,000 rows by
125,000,000 columns only for this example!
Note 03: We may apply some Big Data procedures and tools to “reduce” this subspace, for instance just
counting existent content by row and by column. Counting by row enable us to have a measure of the
suspected incidence of cepts and counting by column a measure about the suspected incidence of a weighted
combination of “authoritativeness”, “specificity” and “representativeness” of each Web page of the sample.
This ds represents the “zero ground” semantic mapping of the discipline and extended to the
whole Web a first raw “zero ground” Human Knowledge Map, where in a given language all the
suspected human “ideas” are represented: a map that hold all ideas in a unique cluster without
being them discriminated by any type of hierarchy (flat). The next step would be then to unveil the
hidden “semantic skeleton” for each discipline, a sort of arboreal structure of its “main subjects”
from hundreds to thousands. Ideally tending to inverted logical “trees” of branches and nodes that
go downwards from root to leaves and having meaningful “modal names” names.
NodesNamesDB, Nodes Names Database: our next Big Data task will be “nodes names” unveiling.
Our CeptsDB - supposedly- has all suspected names of all “in mind” ideas humans have for a single
discipline and for a given language. From previous research we estimate that in the average 1 out
of 40 individuals of the database is a “main subject name” besides a concept. As an example for a
CeptsDB of 400,000 names, 10,000 may correspond to suspected main subjects. Our DM, Darwin
Methodology handle this problem via a specialized algorithm created and settled for each
discipline under a derived ontology (Darwin Ontology) that tells us how we human use and
discriminate “main subjects”.
Main subjects are concepts but not all concepts become main subjects. WWD, Well Written
Documents are - or tend to be - monothematic and accordingly humans intend to tag them
semantically via “their titles” and sub titles either explicitly or implicitly and also via metadata
whether existent. Complementary and statistically main subjects tend to have as much popularity
as all their derived subjects and as each derived subject is associated to a particular and exclusive
set of concepts we may suppose that main subjects have in the average the highest popularities
within the ds, discipline sample!. The structure of this database is constituted by pairs [n, p] where
p stands for “popularity”. Pairs of p lower than a pre established threshold <p> are excluded. DM
algorithms suggest humans the suspected nodes names sub space let’s say a list of 10,000 names
sorted alphabetically and by p. Another discrimination tune up performs the same task over
weighted sub spaces, for example considering only “authoritative” documents according to certain
criteria: traffic, hub power, thematic spectrum bandwidth, etc.
dsA, Authoritativeness of the ds, discipline sample: this is a vector of pairs [u, a] where (a) is a
numerical value associated to the authoritativeness of a document located at (u) address. In fact it
is a subspace of all u’s related to the discipline under inspection. Then if:
From [c, u] we infer the ds (u) space U;
From any [u, a] built for a authoritativeness threshold <a> we infer de ds (u) subspace U<a>;
This task could be performed with no restrictions at all, with U covering all URL’s of the sample up
to a hypothetical U<a> with only one “valid” URL as a hypothetical Webopedia.
Semantic Seed, Semantic Skeleton buildup: for each discipline humans create “semantic seeds”,
something like their upper thematic level that open from their roots. Talking of “The Art” it would
be something like the seven big branches derived from it, namely: visual arts, performing arts,
literature, art history, art infrastructure, culinary art and combat arts. By the way this discipline
has 7,570 nodes and about 400,000 concepts distributed along a tree of 13 levels. In a near future
this initial task could be transferred to an agent. DM strategy is to make this seed grow.
Up to now as of February 2014 DM proceeds along a four level growing deployment process
resembling an explosion of the type 1:10:100:1,000:10,000: from root pass to the seed (1:10),
from 10 pass to 100 (upper level), from 100 pass to 1,000 (medium level) and finally from 1,000 to
10,000 (the tree basement, all leaves).
1:10: the human part of the DM “anthropic algorithms”; a human expert or a “Committee of
Experts” creates what state as the summit of the discipline. This assumption does not mean that
DM accepts it as a true, not even as the best true, only as a strong supposition to be checked as
much as possible.
Remember that we have as our “best truth repository” our CeptsDB a really huge data structure,
virtually a Boolean Matrix in the order of 1,000,000 rows by 100,000,000 columns for each
discipline of the HKM, Human Knowledge map. Much of it could be considered redundant,
wasteful and noisy arriving at the end of DM processes to something like 400,000 by 60,000,000
but we are enabled to significantly reduce the amount of columns playing with “authoritativeness”
arriving perhaps to more dense, homogeneous and coherent cognitive matrices in the order of
100,000 rows (concepts) by 10,000,000 (relative significant Web pages).
Human experts do not have “ex-antes” access to DM agents work. Agents check the seed against
as many layers of space U they have at hand and against the whole Web as if CeptsDB and its
derived databases did not existed. From this check our DM suggests humans experts a set of
possible summit composition ranked with a proprietary algorithm.
10:100: this is a crucial crossroad, the first decision Darwin agents must face by “themselves”
following the creation criteria and setting of DM proprietors. For each node of the summit they
have to hang down a set of possible derived main subjects and without any human guide! What
about if we repeat at a lower scale the “zero ground” CeptsDB creation but now not restricted to
the discipline as a whole but to the specific main subject of the father/mother node? Take into
account that we could have guessed something similar to the human experts’ summit as a
reasonable “semantic seed”. Of course as long as we go deep the tree unveiling we lose the
influence of the human intervention but we gain in AI, Artificial Intelligence coherence somehow
enabling non human objectivity. Instead of checking what human experts guessed at 1:10 now at
10:100 Darwin agents suggest humans the cognitive upper level of the discipline under study.
From our experience (in three prototypes and ten semantic seeds) at this step may prompt a
possible failure of the semantic seed, for instance human experts most of them formed along
decades within an authoritative knowledge atmosphere and consequently with strong and most
times hidden prejudices at the same time. To this respect the Web_as_is is terrifically dynamic:
disciplines and sub disciplines become obsolete, new ones appear and change their meaning and
even semantic pertinence, for example nanotechnology hosted under and within biology, physics,
engineering, fashion and culinary art within art, etc.
100:1,000 and 1,000:10,000: are two rapidly convergent steps applying the same criteria as in the
previous steps. In order to test the coherence of the whole our DM check the skeleton versus the
Darwin Ontology Conjectures. To explain the checking procedure we need to introduce here the
Skeleton Database.
SkeletonDB, Skeleton Database: along a procedure as described above we structure it as 5-ad
“quintuple” [n, {c}, {a}, {u}, h] where:
n: name of the suspected main subject;
{c}: cepts set associated to main subject name n;
{a}: authorities pairs (URL, rank) for main subject name n;
{u}: authorities selected to edit the Semantic Fingerprint of main subject name, associated to a
brief description of the main subject name;
h: hierarchically code of the node as a tree unit;
The filling of this skeleton begins by building {c} for each (n) as the logical sum of all column
vectors *c, u+ associated to name n. Let’s explain this step: be the main subject “Lyric soprano” for
art. In our CeptsDB the row “lyric soprano” is mentioned in 23 Websites u1, u2, u3,…., u23: the
reduce Darwin algorithm then proceeds to “add logically” all the cepts existent in those 23
Websites.
Note 04: this step is not trivial: Boolean matrices we are talking about are not the same. The initial and big
one [c, u] is the mother of many others used by our DM. For The Art the big one had about 900,000 by
125,000,000. This matrix shrinks to about 400,000 by 50,000,000 as explained above. A derived one
necessary to perform the SkeletonDB filling is rather different: for each subject name (7,251 for The Art
mapping) we have to select a sample of URL’s, let’s say 10,000 in the average, and for each URL we have to
retrieve its corresponding {c} set by using the big one Boolean matrix.
=> Back
Aiware Methodology
Juan Chamero, jach_spain@yahoo.es, As of April 24
th
2013
Fuente: Tree of Knowledge, as per John F. Kennedy for Performing Arts
Introduction
Aiware Methodology, we are going to define briefly here, is based on and directly derived from
Darwin Methodology that imagine the knowledge hosted in the Web as structured under the form
of trees, arborescences and arbustive structures in formation. Trees and arborescences are
imagined inverted going from their roots to their leaves deepening within our minds like
symmetric avatars of the real world. Then the HK, Human Knowledge as a whole could be seen like
a forest of trees, arborescences and arbustive structures in formation. Aiware Methodology
explores the Web either to detect and retrieve existent “pieces of K” or to create new ones thru a
four steps methodology named ikAK:
ikAK, aiqueieika, {i, k, A, K} where:
i: stands for ideas in mind;
k: stands for keywords;
A: stands for Authorities (semantic);
K: stands for Knowledge;
As a methodology it also means:
[intelligence to know Amap and Asap about Knowledge]
The result is the end of the acronym: eventually the whole K or a piece of it, for instance a HKM,
Human Knowledge Map or an IR, Intelligence Report, at large a piece of K.
i, the first step: In a human to human relation, Aiware representatives versus prospective
customers, Aiware representatives have to “infer” from their prospective customers their “in
mind” ideas about what to get as a final outcome of Aiware’s services. These ideas must be
precisely described.
k, the second step: Aiware experts must create a primal set of “keywords” to detect and retrieve
amap and asap A’s (Authorities) and (K) accordingly. This set is the semantic arsenal to unveil
amap and asap A’s and K: the more k’s become concepts the more their unveiling potentiality.
A, the third step: within Darwin Ontology the entities that certify semantic validity are A’s. As
Aiware works under this ontology Darwin agents and algorithms under human control explore the
whole Web searching for A’s that “fit” better Aiware’s customers in mind ideas.
K, the fourth step: where K stands for human knowledge. Darwin agents and algorithms working
under human supervision detect, collect and classify more than necessary raw information and
intelligence, in order to build meaningful pieces of the required knowledge.
How the ikAK Procedure works
A prospective customer, either personally or as a representative of a group, has a need under the
form of an “in mind” idea. The idea is finally unveiled, discussed and it will head the Aiware
proposal. As the work is going to be performed via the Web all the necessary information and
intelligence to satisfy the prospective customer need must be detected and retrieved from it, a
huge data Ocean actually holding more than 30.000 million document (Web Pages).
Source: “Underground Art”, London Metro
Web connectedness: The figure above depicts that everything leads to everything: the Web is a
space where everything is connected and where one thing leads to another thing so no matter
where a journey searching for something starts: at large we (or our agents) may arrive to any given
target going from hyperlink to hyperlink.
A Little more about ikAK steps:
i step k step A step K step
IHMC Nanomecánica Cartoonstock.com Fathaur Tree
(i, k) interaction: Initially k step could iterate recursively versus i step as many times as necessary
until the semantic quality of k attain a reasonable level of meaningfulness resembling more and
more concepts instead of simple keywords. This task could be totally performed by agents or by
humans aided by agents.
(i, k, A) interaction: Next Aiware proceeds to step A, identifying Authorities. This is a core Darwin
step performed by special algorithms and agents based on scouting the Web thru a sort of
“random walks” along preselected authoritative Websites pointed by the former reservoir of k’s
arsenal. These random walks are controlled by a “Markovian memory factor” that emulates real
time human memory activation thru explorations.
This scouting enlarge the A sample meaningfully and an exponential process of auto learning
starts: agents found more and more potential A’s that notwithstanding should be checked. Along a
three level iteration process (i, k, A) Darwin arrives via special semantic algorithms to a two
dimensional semantic matrix of A’s versus k’s where many of A’s are within specific common root
domains and many of k’s may potentially belong, are subordinated to or derived from common
subjects, many of them could be look like embedded within others.
(k, A, C) interaction and draft: As a third dimension appears a piece of content C for each pair (A,
k): Darwin agents emulate human content capture for each (A, k) option. Finally a raw document
of Information and Intelligence discriminated by the triad (k, A, C) is presented as a Final Report
Draft to human consideration. Being you a journalist this triad would provide you as an editor all
what you need to build a meaningful Final Report, namely an Intelligent Report, a Survey Report, a
Main Behavior Trends Report etc. Talking about some figures: A’s, k’s and C’s could be in the range
of thousands but A’s and k’s ranked by their relative importance (A’s could be grouped by domain
root). => Back
BIG DATA Semantics Primer
Tesauro Básico de Big Data
Juan Chamero, as of January 1st
2014
Source: DARPA Topological Data Analysis, from Big Data, Wikipedia
Presentation
This section could be titled “Big Data in a hurry or “Big Data a las apuradas” in Spanish. It should be
considered a sort of accelerated e-learning experience. Non English speaking people may use the
translations facilities of the site - still basic but good enough to understand what the matter is-.
Let me to present myself: I’m Juan Chamero, technically an AI, Artificial Intelligence and High
Complexity Systems expert and a Zen master. Concerning this duality this section will have more of
Zen that of science, both human culture pearls, Zen from the “Far East” and science from our
“Western Culture”.
We are going to present technically a raw and brief Big Data Thesaurus. Pieces of information and
knowledge will be prompted without too much ado and explanations like “semantic pills”
notwithstanding apt to be - at large- understood and used provided we devote ten times more of
space and time. As voluntary Zen “practitioners” try to open your minds, not to oppose anything,
let it flow everything comprehensible or not. It is fundamental to be fully aware trusting about the
syncretism and cognitive threading power of our brain. To be fully aware, the let it flow spirit is
Zen; imagery is both science and Zen and the continuous generation of hypothesis is science.
When I was too much younger I had the opportunity to participate in a singular experience. At
those times IBM was recruiting and forming their first promotions of Systems Engineers from
people coming from all over the world with scarce to null knowledge of English and from the most
ample spectrum of formation. We all were immersed in an intensive one year course - in English-
about logic, mathematics, physics, operations research, economy, business administration, and
complemented with seminars about epistemology, philosophy, sociology, and politics. Nobody
was death in the attempt!
Warning: by practical reasons all semantic pills will be written in English.
Presentación
Esta sección podría llamarse “Big Data in a hurry” en inglés o “Big Data a las apuradas” en español.
Es una experiencia piloto en “spanglish” de aprendizaje acelerado. Siempre están a disposición de
los usuarios la facilidad de traducción automática - muy rudimentaria aún pero suficiente para
entender de qué se trata - que ofrece el sitio.
Permítan presentarme: soy Juan Chamero en lo científico técnico experto en Inteligencia Artificial
y en Sistemas de Alta complejidad y en lo humanístico maestro Zen. Al respecto ésta sección tiene
más aporte de Zen que de ciencia, ambas perlas de la cultura humana, el Zen de la cultura
denominada del “Lejano Oriente” y la segunda de lo que hoy conocemos como “Mundo
Occidental”.
Técnicamente vamos a presentar un muy breve Tesauro sobre Big Data. Fragmentos de
información y de conocimiento serán presentados sin demasiada explicación y aptos para ser
comprendidos y usados si fueran presentados en diez veces más de espacio y tiempo. Como
practicantes Zen voluntarios traten de abrir sus mentes, no rechazar absolutamente nada, intentar
intuir o hasta adivinar explicaciones, aplicaciones y usos de lo que se va mostrando. Lo
fundamental es estar muy atento y confiar en que a medida que se vayan viendo estos fragmentos
vamos hilvanando en nuestro cerebro conocimiento sobre Big Data. La disciplina, el estar atento,
el no rechazo es Zen, la imaginería es tanto Zen como ciencia y la generación continua de hipótesis
es ciencia.
En mi juventud tuve la oportunidad de participar en una experiencia similar en IBM, que a esa
sazón, han pasado muchas décadas, estaba formando a sus expertos en la naciente disciplina de
Ingeniería de Sistemas. Durante un año, con desconocimiento o conocimiento muy escaso del
inglés, participantes de distintos países y con formaciones profesionales, técnicas y científicas del
más diverso tipo, fuimos inmersos en cursos intensivos en inglés sobre lógica, matemática, física,
investigación operativa, economía, sociología, administración de negocios, complementados con
seminarios sobre filosofía, epistemología, sociología y política. ¡Ninguno falleció en el intento!
Aviso: Por razones prácticas los fragmentos o “pastillas semánticas” de la experiencia cognitiva van a ser en
idioma inglés.
Semantic Pill 1
Tips: we are going to start this series with a sort of “e-potpourry”. Please be patient!. The list of
terms down each pill are a piece of our Basic Big Data Thesaurus alphabetically classified from
numerals, and from A to Z. Each pill will deal with a small bunch of them so selecting one term at a
time to generate the pill would be -probably- like picking thematically at random. Take into
account that I’m acquiring experience: my first impression when facing this first bunch was literally
written as follows:
SEO activity; what’s an avatar; learning to ACT?; what’s DP?; what’s the meaning of Web Services;
3V as a metric, limited to 3?; what’s has to do Amazon with Big Data?; what’s the relation between
Web Services and Big Data?.
My own experience told me those questions: of course I know what DP is but I never imagined in
ICT something like “learning to ACT”, deepening in avatar meanings and what to do massive retail
with Big Data
o 3Vs model, http://www.ascilite.org.au/conferences/singapore07/procs/atkinson.pdf,
virtuality, veracity, values;
o In the figure B would be a real person and A depicted as a probable polar extreme avatar;
o A/B testing, a SEO tool;
o ACT in Real Time, a DP approach to Decisions Theory,
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9423, Learning to ACT via
DP;
o Amazon Web Services, see AWS, http://aws.amazon.com/big-data/;
o Amazon, see AWS, Amazon Web Services;
o Appropriate technology;
o Asynchronous DP;
o Asynchronous Dynamic Programming;
o AWS;
Semantic Pill 2
Tips: could you identify all Big Data Landscape actors?; Do you know the acronyms built in as for
example DaaS, Data as a Service?; do you know the difference in between “Apps” and
“Applications?; could you discriminate in between structured versus non structured data?; what
about differences in between ordinary versus emotional intelligence; could you imagine some
examples; take a look to Business Intelligence, as of Wikipedia; think a little about the type of data
that BI applications handle: prevails unstructured? or perhaps structured? o perhaps depending of
the case.
o BD, see Big Data;
o Better decision making, within the context of Emotional Intelligence;
o BI, see business Intelligence;
Semantic Pill 3
Tips: read carefully the Obama Government Initiative; is Big Data a tool for governance; take a
historic journey to McNamara initiatives enhancing strategy over tactics; take a look to The Prince
of Machiavelli; See the article below about the six principles of building a solid Big Data: you will
realize that the article only talk about one of the tools actually used to manage Big Data (Hadoop):
however do not despise these principles because they have a lot of wisdom (see next tip); read the
Information Week report below:
Establishing big data governance policies is critically important because the amount of data involved not only
threatens to overwhelm IT organizations, but how that data gets used will actually determine the success of
such projects. Specifically, organizations need to understand what type of data is valuable enough to keep, as
opposed to data that is expendable.
Unfortunately, a recent survey of 3,500 IT leaders conducted by TEKsystems, a provider of IT services, found
that 66% of IT leaders and 53% of IT professionals said their data is stored in disparate systems and that they
need new platforms to accommodate these increased data management needs.
o Big data governance, http://www.informationweek.com/software/information-
management/big-data-governance-moves-up-the-it-agenda/d/d-id/1111875?, 81% IT
leaders say their companies do not know how to cope with this item;
o big data integration;
o Big Data Research and Development Initiative, Obama Government Initiative;
o Big Data Six Principles, http://www.metascale.com/resources/blogs/151-6-principles-for-
building-your-big-data-talent#.UstBTtJDsZk
o Big Data, see BD;
Semantic Pill 4
Tips: physical and mental socialization of work, creation, and innovation; we invite you to see the
article below about the blackboard metaphor a technique used to face cooperatively complex
problems: some of its uses are: Breaking complex cryptographic codes, computer vision, Speech
recognition, Command and Control Systems, Surveillance Systems, Workflow Processing, Case-
based reasoning, Symbolic learning and Data Fusion; we recommend you to invest as time as you
can to study these creatures: the cellular automaton: you are going to learn a bit more and better
what AI and Big Data are and entering into the world of one of the visionaries of the last century
and for many the father of computing as we know today: Von Neumann; This pill is going to take
longer that previous ones; we recommend to you to explore the DamFoundation.org Website to
see a real and perhaps the most outstanding Big Data Application in the world; If you want to
know a little about computational logic and math related to big data take a look to the complexity
of a data collection algorithm that at large discriminate data in big clusters (“clustering”); Finally
this pill ends with “cloud computing” a concept in formation not yet well defined and closely
related to a concept we have seen: “Web Services” (see “DaaS” in Pill 2);
The Evolution of Big Data at CERN, by DamFoundation.org
o Blackboard metaphor, http://c2.com/cgi/wiki?BlackboardMetaphor, a method of working
when dealing with problems of high complexity among many people;
o Brand monitoring, mainly via Social Media;
o Business intelligence, see BI;
o Cellular automaton, see related to how to behave semantically Web creatures,
http://en.wikipedia.org/wiki/Cellular_automaton
o Cloud computing, http://en.wikipedia.org/wiki/Cloud_computing;
o CERN data stream, http://cds.cern.ch/record/1430825, data collection algorithm, it’s a
sort of Darwin cave men, trying to discriminate homogeneous clusters,
o The Evolution of Big Data at CERN, http://damfoundation.org/category/big-data-2/;
Semantic Pill 5
Tips: community is one of the leading global term of today that deserves 1,400,000,000 references
in Google in the order of magnitude of “city” (1,600,000,000 instead); community will derive in Big
Data applications very soon ( take a look to Types of Communities and you may imagine our world
as a coexistent and superposed giant creature of types of communities).
You must see the Human Connectome Project Website, where groups of unrelated and related
people brains are scanned in real time in streams of several terabytes; “Connectomics” could be a
new brand of neural research within the scope of Big Data: everything is big and extremely
complex; we again find the concept of “computing clusters” huge data sets of suspected
homogeneous data, both structured and non structured; a very recent possible area of Big Data are
Corporate Portals that behave as virtual Web Corporations servicing at large communities of
physical and juridical persons (owners, customers, employees, providers, directors, third parties,
competitors, associates, partners) like a living creature.
Crime combat and prevention, together with other semantic “modal” associations such as “crime
and delinquency”, “organized crime”, and “transnational organized crime” should be carefully
registered; mapping everything and visualization are closely related to Big Data and there is a
strong correlation in between them and the vital needs of we humans in order to survive and
evolve: good and evil are always moving but perhaps, in the average the evil moves more and
faster than good: for instance crime forms (criminals and diseases) move fast when stopped
changing “modus operandi”, regions of activity, strategies, actors, etc; crowdsourcing is another
Big Problem that also involves Big Data: people that live well should be aware 24x7 via almost
enforced global visualizations of people that live bad next door and elbow to elbow with us.
Gallery example of the Human Connectome Project
o Community engagement, http://en.wikipedia.org/wiki/Community_engagement;
o Computer cluster,
o Connectomics, related to neural science, and to “Human Connectome Project”, we include
this as the limit of BD possibilities;
o Corporate portals, http://www.corporateportals.eu/what_is_a_corporate_portal.htm;
o Crimen combat,
o Crimen prevention,
o Crowdsourcing, see
Semantic Pill 6
Tips: curation, more specifically “data curation” is a concept not too much used until now in the IT
arena; seen data as always changing tracks of human life, - the river never is the same- , it is a
crucial asset; without data no history and without history no meaning; this point brings back old
questions: has sense to keep and save all data?; is it equivalent a synthesis of data to its raw data?;
You should know something about DARPA, the Internet creator and first proprietary, at least to
know something about its history since 1958 and Darpa and the Internet Revolution (PDF); by the
way DARPA always managed Big Data (we have to be acquainted by reading these pills that Big
Data is a relative concept within the “state of the art of the technology”.
DARPA again introduces a necessary concept to understand Big Data: Data Topology (see our
Home); you should read something such as Data Topology for Dummies: we found one for
beginners because this subject requires too much math abstraction and imagination. See below a
paragraph of the article:
Supposewehaveconducted1000experimentswithasetof100variousmeasurementsineach.Theneachexperimentisastringof
100numbersorsimplyavectorofdimension100.Theresultisacollectionofdisconnected1000points(akapointcloud)inthe100-
dimensionalEuclidean space. It is impossible to visualize this data as any representation that one can see is limited to dimension
3…..
Source: Data Curation Life Cycle, as of UTexas
o Curation data, see data curation,
http://www.lib.utexas.edu/services/digital/dpoc/dpoc_data_lifecycle_management.html,
Curation Life Cycle;
o Curation storage;
o Curation tools;
o DARPA, http://www.darpa.mil/, the Defense Advanced Research Projects Agency, from
USA, it was the Internet creator and donor;
o Darpa Topological Data Analysis, topology of existent BD (2009),
http://www.carlisle.army.mil/DIME/documents/StratPlan091.pdf, We have to carefully
read this milestone!;
Semantic Pill 7
Tips: “textons” is a term coined by our Darwin Methodology making reference to huge files of
homogeneous supposed content, for instance sets of 100,000 Web pages HTML sources supposedly
dealing with the same subject chained as a single txt file; DAS is a technology to directly (D)
connect a storage (S) to computers via buses: It has to be studied together with SAN and NAS,
technologies that connect storages (S) to computer via networks (N); dashboards must be
semantically studied within their “content of use” context, for instance BI, Business Intelligence;
let’s deep a little about data curation:
"Data curation is the active and on-going management of data through its lifecycle of interest and usefulness
to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain
quality, add value, and provide for re-use over time. University of Illinois' Graduate School of Library and
Information Science.
Finally this tip is closed with Data Defined Storage, a sort of Data Centric architecture focusing in
semantics minimizing aspects of media, type, time and place of data generation. The thesis is that
if you have all your data well structured (data and metadata) you may find an appropriate and
optimal solution to your data management problems.
Darwin Semantic Skeleton highlighting a particular semantic neighborhood spectrum
o Darwin textons, extra large text files processed by Darwin Methodology;
o DAS, see Direct Attached Storage, http://en.wikipedia.org/wiki/Direct-attached_storage;
o Dashboard enterprise, a sort of soft package associated to a server;
o Dashboard, simple BI, Business Intelligence monitoring unit, http://www.maia-
intelligence.com/pdf/Confluent-SCIT-Sep-2010.pdf;
o Data curation, techniques and tools for useful BD validity and long lasting preservation,
http://en.wikipedia.org/wiki/Data_curation;
o Data defined storage, http://en.wikipedia.org/wiki/Data_Defined_Storage, an approach to
something like “Semantic Database”;
o DDS, see Data Defined Storage;
Semantic Pill 8
Tips: Development Informatics is a new field (2009) of Informatics applied to social problems and
equivalent to ICT4D, Information Technologies four (4) Development (D). The subtle difference are
in the approaches related to the interpretation of “Informatics” versus “ICT”, which stands for
Information and Communications Technologies: the first acceptation focuses on European
Cosmovision meanwhile the second on American Cosmovision; Distributed Parallel Architecture is
something you need to know to manage large data sets: you may find here a research paper
(2012) about it where you may compare three Big Data models-tools, namely MapReduce
(Google), Hadoop (Apache) and HBase (Apache); DP, Dynamic Programming deserves a special
consideration as an old and always present intelligent approach to “Operations Research” and
particularly to “Decision Problems”: its common sense approach is as follows:
In order to solve a complex problem of overlapped sub problems we need to know and solve parts of it and
then try to solve the whole puzzle by combining solutions. From this reasoning depart at least two strategies:
the “brute force” or naïve solving any sub problem each time is needed and a more intelligent solving each
sub problem only once. This second approach is fundamental when the amount of overlapping sub problems
grow exponentially as a function of the data set volume for example in genomics. See some interesting
examples for beginners here.
Graph that illustrates “Finding the shortest path in a graph” using optimal substructure; a straight
line indicates a single edge; a wavy line indicates a shortest path between the two vertices it
connects (other nodes on these paths are not shown); the bold line is the overall shortest path from
start to goal. From Dynamic Programming, Wikipedia
o Development informatics,
o Direct Attached Storage,
o Disease prevention,
o Distributed parallel architecture, http://www.revistaie.ase.ro/content/62/12%20-
%20Boja.pdf, specific for BD apps;
o DP, see Dynamic Programming;
o Dynamic Programming, see DP, http://en.wikipedia.org/wiki/Dynamic_programming, one
of the most used Operations Research tools again!;
Semantic Pill 9
Tips: Emotional Intelligence regains its place within our “Tech Era”, emphasizing the
communication power of gestures and attitudes, see Edwin Friedman; we as spontaneous Web
users are continuously building a new “semiotics” where semantic is one of its branches; EaaS is a
new term, rather ambiguous still: some companies like Hewlett Packard use it related to Cloud
Services that at their turn look like a “bazaar”: please, as a sample take a look to its Piksel
audiovideo; e-Bay Principal Architect Tom Fastner speaking at the Teradata Partners Conference
held in Dallas, 23rd
October 2012:
In monitoring their 100 million customers' interactions - from every button they click to every product they
buy - eBay creates 12TB of data per day which is continually added to a 4 petabyte table containing 4tn rows
of data. As the data is queried both by automatic monitoring systems and employees looking to find more
meaning from it, data throughput reaches 100 petabytes (102,400TB) per day.
Finally EU4ALL is an EU initiative for Accessing Lifelong Learning for Higher Education mainly
sponsored by UNED Spain and Open University from UK. Its relationship with BD is indirect thru its
connection with the ITC4D, Information and Technologies for Development already seen;
o EaaS, Everything as a Service;
o eBay, http://www.v3.co.uk/v3-uk/news/2302017/ebay-using-big-data-analytics-to-drive-
up-price-listings, how eBay handles BD;
o EIP, Enterprise Information Portal, see EP, Enterprise Portals;
o Emotional Intelligence;
o Enterprise portals, see also EIP, they try to offer what’s evolving by itself, namely
“business Portals”, http://en.wikipedia.org/wiki/Enterprise_portal;
o EU4ALL;
o European Union for Assisted Life Long Learning, see EU4ALL;
Semantic Pill 10
Tips: we are going to see soon an industry of icons and gadgets related to Big Data like for instance
an icon to go to Hadoop; Gartner Group is a well known IT Research and Consulting firm specialize
in intelligence reports about business trends: you should know its two brands tools, “hype cycle” to
evaluate a technology life and “magic quadrant” (or MQ) to evaluate markets: it has an interesting
and authoritative IT Glossary that says about Big Data:
Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.
O Reilly announces STRATA Conference next February, in Santa Clara, California, saying: the future
belongs to those who understand Big Data and presents an article about “genomics” saying:
The amount of data being produced by sequencing, mapping, and analyzing genomes propels genomics into
the realm of Big Data. Genomics produces huge volumes of data; each human genome has 20,000-25,000
genes comprised of 3 million base pairs. This amounts to 100 gigabytes of data, equivalent to 102,400
photos. Sequencing multiple human genomes would quickly add up to hundreds of petabytes of data, and
the data created by analysis of gene interactions multiplies those further.
Windows gadgets
Are we entering into a Big Data Gadgets mania?, by Avancos
o Gadgets, http://en.wikipedia.org/wiki/Gadget;
o Gartner Group;
o genomics, http://strata.oreilly.com/2013/08/genomics-and-the-role-of-big-data-in-
personalizing-the-healthcare-experience.html, genomics and Big Data;
Semantic Pill 11
Tips: Google gadgets as seen above have been deprecated by Google: however gadgets always will
have a significant place within the ICT Community and specifically within Big Data, see figure
below; GIS stands for Geographic Information System and was always a typical application of “Big
Data of any time” that is always capturing, structuring, retrieving and managing data at the limit
of the available technologies, for instance GIS Tools for Hadoop;
Something similar occurs with GPS, Global Positioning System (see ephemeris and its world); by the
way along our explorations guides by this semantic seed we face from time to time interesting
authorities related to our main subject: see Content about GPS by Big Data Insight Group; HaaS,
Harware as a Service alone has not too much sense perhaps less than SaaS, Software as a Service
because for ICT it would be more about the same. However HaaS and SaaS related to Big Data
have sense because Big Data is a challenge for everybody, vendors, buyers and users (see HaaS
now by the University of California at Berkeley) ;
Finally we arrive to Hadoop from Apache. If you are a beginner we advise you to star reading its
introduction What is Apache Hadoop:
The Apache Hadoop software library is a framework that allows for the distributed processing of large data
sets across clusters of computers using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage. Rather than rely on
hardware to deliver high-availability, the library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top of a cluster of computers, each of which may
be prone to failures. The project includes these modules: Hadoop Common: The common utilities that
support the other Hadoop modules; Hadoop Distributed File System (HDFS™); A distributed file system that
provides high-throughput access to application data; Hadoop YARN: A framework for job scheduling and
cluster resource management; Hadoop MapReduce: A YARN-based system for parallel processing of large
data sets.
Ericson said that Data Traffic doubles this year and trend continues, by gadgets.ndtc.com
o GIS;
o Google gadgets, http://en.wikipedia.org/wiki/Google_Gadgets;
o Google Trends;
o GPS databases, http://en.wikipedia.org/wiki/Global_Positioning_System;
o HaaS, Hardware as a Service;
o Hadoop Distributed File System, see HDFS;
Semantic Pill 12
Tips: HDFS architecture has been mentioned above as part of Hadoop: YOYOClouds define
it as:
HDFS is a block-structured file system: individual files are broken into blocks of a fixed size. These
blocks are stored across a cluster of one or more machines with data storage capacity. Individual
machines in the cluster are referred to as DataNodes. A file can be made of several blocks, and they
are not necessarily stored on the same machine; the target machines which hold each block are
chosen randomly on a block-by-block basis. Thus access to a file may require the cooperation of
multiple machines, but supports file sizes far larger than a single-machine DFS; individual files can
require more space than a single hard drive could hold.If several machines must be involved in the
serving of a file, then a file could be rendered unavailable by the loss of any one of those machines.
HDFS combats this problem by replicating each block across a number of machines (3, by default).
HCP stand for Human Connectome Project, a NIH, National Institutes of Health project
about Neuroscience Research to build a “brain map of healthy humans” to “see more and
better” its connectivity architecture and functionality to shed light on brain disorders such
as dyslexia, autism, Alzheimer’s disease and schizophrenia; House the World is global
project to “house the world” in the sense of housing for all focusing on durability and
affordability, see New Architectural Design, finding solutions for the extremely poor; ICT4D
has been considered in several previous pills however this discipline may open our minds to
see (Mobile and Development) how advanced technologies could be more usable and
efficient for the poor and people with disabilities than for the rich and health; igoogle
discontinuity is a demonstration of the ephemeral nature of the Internet projects: big
Internet masses behave as without having “inertia” with the subtleness of the “nothing
and when something stops its growth it starts to die!;
HDFS architecture, by YOYO Clouds
o HCP, see connectomis;
o HDFS, an open Apache BD management;
o House the World, http://housetheworld.org/open-develpment-
model/crowdsourcing/?gclid=CO302LCPi7sCFe3m7AodLkMADw, a BD philosophic approach;
o Human Connectome Project,
o IaaS, Infrastructure as a Service;
o ICT4D,
http://en.wikipedia.org/wiki/Information_and_communication_technologies_for_development;
o iGoogle, discontinued, personal Web pages, open and free gadgets library;
o Information and communication technology for development, see ICT4D;
Semantic Pill 13
Tips: LSST that stands for Large Synaptic Survey Telescope is “widest, fastest, deepest eye of the
new digital age”:
The 8.4-meter LSST will survey the entire visible sky deeply in multiple colors every week with its three-billion
pixel digital camera, probing the mysteries of Dark Matter and Dark Energy, and opening a movie-like
window on objects that change or move rapidly: exploding supernovae, potentially hazardous near-Earth
asteroids, and distant Kuiper Belt Objects.
LFE, Learn From Everyone is A knowledge sharing initiative launched by young Chinese, from
Ministry of Tofu, China: it proposes something as old as our civilization: knowledge and wisdom is in
any place and at any moment at any culture and even in any creature; Legal citation and claims is
closely related to news areas as The Practice of Law in the age of Big Data; Linked Data is a term
coined by Tim Berners Lee the Web creator making reference to uses of structured data and closely
related to LOD, Linked Open Data, a Community Project at its turn related to Data Commons and
Open Knowledge: all these initiatives are oriented to build a Semantic Web, as structured as
possible;
Example of a piece of LinkedData, as per Wikipedia
o Large Synoptic Survey Telescope, see LSST, http://www.lsst.org/lsst/;
o Learn from everyone, http://www.ministryoftofu.com/2013/08/learn-from-everyone-a-
knowledge-sharing-initiative-launched-by-young-chinese/;
o Legal citations;
o Legal claims;
o Linked Data, http://en.wikipedia.org/wiki/Linked_Open_Data;
o Linked Open Data, see also Linked Data;
o LOD;
o Listen to Everything, it has many acceptations, see this http://www.infowars.com/mit-
future-smartphones-will-listen-to-everything-all-the-time/;
o LSST, see Large Synoptic Survey Telescope;
Semantic Pill 14
Tips: Map Reduce is a methodology to process large data sets via parallel and distributed tools and
algorithms. Conceptually the idea is not new: most sorting techniques applied in conventional
computing used similar procedures, especially when challenged with large data sets. It has two
main procedures, namely: 1) map o mapping that by filtering, masking and sorting data sets are
open in streams, and 2) reduce, synthesizing and summarizing those streams. MapReduce also
refers to a similar methodology used by Google. Hadoop is one of the implementations of this
idea. See also the mother idea-paradigm “divide and conquer algorithms”;
Below a MDP, Markovian Decision Process schema of an “entity”, either real or artificial is depicted, with
three possible “states” S0, S1 and S2, and only two possible “actions”, a0 and a1 to state change no matter
the state. In order that these sort of automatons represent “alive” entities should exist an associated
probability P(a) *s. s’+ that state change from s to s’ at time (t+1) by executing action (a). In order to evolve -
or at least to have a reason of existence - we should associate to this automaton a “Reward” function R(a) *s,
s’+ when state changes from s to s’ due to action (a). These rewards are associated to learning. It also defines
the 4-tuple [S A P R], State, Action, Probability, and Reward.
Mapreduce Google Rank Examples from admin-magazine.com
Fuente: MDP from Wikipedia
o mapreduce, a BD model of parallel processing, http://en.wikipedia.org/wiki/MapReduce, see also hadoop;
o Markovian Decision Process, see MDP and the old Richard Bellman algorithms;
Semantic Pill 15
Tips: Massivelly Parallel Processing is perhaps a term more specific to Big Data because it makes
reference to a family of parallel procedures related to the art of computing for example Grid
Computing, Cloud Computing, Computer Cluster, Infiniband, at large a maremagnum of names and
acronyms. However we recommend you to read/study/review these “common sense” laws to take
into account whens dealing with parallel processing at big scale: Amdahl Law, Gustafson Law, Flat
Metric (could be considered a law), and Moore Law; You may also be acquainted with McKinsey
Reports, then take a look to this: Big data: The next frontier for innovation, competition, and
productivity (May 2011, extrapolate it as of January 2014!):
MGI (McKinsey Global Institute) studied big data in five domains—healthcare in the United States, the
public sector in Europe, retail in the United States, and manufacturing and personal-location data globally.
Big data can generate value in each. For example, a retailer using big data to the full could increase its
operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential,
too. If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector
could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing
US healthcare expenditure by about 8 percent. In the developed economies of Europe, government
administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone
by using big data, not including using big data to reduce fraud and errors and boost the collection of tax
revenues. And users of services enabled by personal-location data could capture $600 billion in consumer
surplus. The research offers seven key insights.
McKinsey Forecast (2025): Developed versus developing economies impacts: 3D Printing among 12 new technologies
o Massive Parallel processing, http://en.wikipedia.org/wiki/Massively_parallel_(computing);
o McKinsey reports;
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovati
on;
o MDP, synthesized as R(a)*s, s’+: Reward on (a) action, from state s going to state s’;
o META Group, see Gartner, it was acquired by Gartner;
Semantic Pill 16
Tips: WeatherSignal: Big Data Meets Forecasting, from a Scientific American Blog talks
about the impact of Big Data on forecasts on different areas from weather to health issues
stating polemic the following controversy:
The philosophy of Big Data is that insights can be drawn from a large volume of ‘dirty’ (or ‘noisy’) data, rather than simply relying on a
small number of precise observations – a subject covered in detail by Viktor Mayer-Schönberger and Kenneth Cukier in their recent book
‘Big Data’. One good example of the success of the ‘Big Data’ approach can be seen in Google’s Flu Trends which uses Google searches
to track the spread of flu outbreaks worldwide. It is also important to remember that Big Data when used on its own can only provide
probabilistic insights based on correlation; The true benefit of Big Data is that it drives correlative insights, which are achieved through
the comparison of independent datasets. It is this that buttresses the Big Data philosophy of ‘more data is better data’; you do not
necessarily know what use the data you are collecting will have until you can investigate and compare it with other datasets.
Mike2.0 is an Open Source collaborative private undertaking trying to build and lead a sort of
Information Management community; MINE, Maximal Information - based Non Parametrical
Exploration, deals with visualization of datasets basically of “pairs” represented as a Cartesian X, Y
Map: in order to “see more and better” these maps you need to know first MIC, Maximal
Information Coefficient measures the strength of linear or non linear associations between X and Y.
MIC belongs to a statistical class experimentally used for Detecting Novel Associations in Large
Data Sets (Jun 2012):
Imagine a dataset with hundreds of variables, which may contain important, undiscovered relationships. There are tens of thousands of
variable pairs—far too many to examine manually. If you do not already know what kinds of relationships to search for, how do you
efficiently identify the important ones? Datasets of this size are increasingly common in fields as varied as genomics, physics, political
science, and economics, making this question an important and growing challenge). One way to begin exploring a large dataset is to
search for pairs of variables that are closely associated. To do this, we could calculate some measure of dependence for each pair, rank
the pairs by their scores, and examine the top-scoring pairs. For this strategy to work, the statistic we use to measure dependence should
have two heuristic properties: generality and equitability.
Source: Scientific American, Smartphone Weather Signal Dashboard
o Meteorology forecasts, Scientific America, Big Data meets Forecasting,
http://blogs.scientificamerican.com/guest-blog/2013/10/11/weathersignal-big-data-meets-forecasting/;
o MIKE2.0, http://mike2.openmethodology.org/;
o MINE, Maximal Information-based Non Parametrical Exploration, http://www.exploredata.net/;
Semantic Pill 17
Tips: Go to Mobile massively: the UIC, University of Chicago thru its Office of Technology
Management has launched a program to encourage and help students to create their own Apps for
Development and Deployment using IOS, iPhone OS and Android OS; MSL, Multilinear Subspace
Learning, is a family of ideas and applications to “see more and better” large multidimensional
objects: most objects are multidimensional (relative to our visual capacity limited to 3D): Most big
data sets are multidimensional with objects rarely distributed highly redundant and noisy and for
these reasons techniques of dimensionality reduction are used map high-dimensional data to a
low-dimensional space while retaining as much information as possible. We recommend to review
Matlab, you are going to need it, and specifically about Matlab Tensor Box;
http://www.csc.com/cscworld/publications/81769/81773-
supercomputing_the_climate_nasa_s_big_data_mission, a CSC article about the NCCS Discover
supercomputing cluster, which ranks among the top 100 supercomputers in the world, plays a
central role in NASA’s earth science mission and is the main system used for processing jobs that
require significant computing resources. To put in numbers some goals: Discover can compute in
one day three simulated days in the life of the Earth at one of the highest resolutions ever attained
— about 3.5-kilometer global resolution, or about 3.6 billion grid cells. The center’s current
“stretch” goal is to generate in one day a computation that covers 365 days at 1-km global
resolution;
NASA Brings “Big Data” to the Cloud, by the USRA, Universities Space Research Associations
o mobile app development and deployment, example of a “Mobile rush” Initiative of the
University of Illinois, Chicago, http://otm.uic.edu/node/4371;
o MPP, see Massive Parallel Processing;
o MSL, see Multilinear Subspace Learning;
o Multilinear subspace learning, see MSL;
o Nasa Center for Climate Simulation, see NCCS
o NASA, NASA Big Data Mission, http://www.csc.com/cscworld/publications/81769/81773-
supercomputing_the_climate_nasa_s_big_data_mission;
Semantic Pill 18
Tips: Infectious Diseases following natural disasters, another big scenario, from NLM, The National
library of Medicine, NIH, National Institute of Health: this article will trigger on our mind something
“metadata related” that is we found on it something valuable for our Web Semantic eLearning
process: the “tag” MeSH which stands for Medical Subject Headings, that belong to the NLM
Controlled Vocabulary of the PubMed Thesaurus, something like a Web Thesaurus! This article says
(abstract):
Natural disasters may lead to infectious disease outbreaks when they result in substantial population
displacement and exacerbate synergic risk factors (change in the environment, in human conditions and in
the vulnerability to existing pathogens) for disease transmission. We reviewed risk factors and potential
infectious diseases resulting from prolonged secondary effects of major natural disasters that occurred from
2000 to 2011.
Within Natural Disasters are some ones closely related to visible Earth changes like for instance
the Weather, see the image below in Natural Disaster and Extreme Weather, a collection of
environmental articles published by The Guardian (UK) like Australia links 'angry summer' to
climate change – at last!;The navigation paradox deals with collision risk in navigation via any
object either real or virtual: aircraft, ships, cars. A paradox could be either a valid or not valid
argument but helps as a tool of analysis and to enhance our critics. For example when talking
about the incidence of navigation risks as a function of technology and people awareness of it (how
well they use it) we may easily arrive to - or driven to- contradictions like that for the best
technology with an excellent level of awareness may create at large risk scenarios.
`
Extreme Weather and Global Warming are linked, The Guardian
o Natural disasters, http://www.ncbi.nlm.nih.gov/pubmed/22149618 ;
o Navigation paradox, http://en.wikipedia.org/wiki/Navigation_paradox;
o NCCS, see also CSC NCCS, supercomputing program to “supercomputing the climate” from
NASA;
Semantic Pill 19
Tips: Are we entering into a real “Big Science” or are we as ever striving hard pushing the frontier
of the unknown? Data Driven Discovery talks about a digital copy of the universe encrypted, (see
LSST, Large Synoptic Survey Telescope in previous pills):
“The data volumes we *will get+ out of LSST are so large that the limitation on our ability to do science isn’t
the ability to collect the data, it’s the ability to understand the systematic uncertainties in the data,”
said Andrew Connolly, an astronomer at the University of Washington.
Non Linear System Identification deals with identifying the different types of real life systems under
study namely industrial processes, control systems, economy, life sciences, medicine social systems
and networks, etc, because most of them are nonlinear being linearity a form of idealization in
order to study them under the laws and tools of math and logic. This article goes a little ahead
idealizing data itself affirming: “Finding the unexpected in a higher-dimensional space is
impossible using the human brain.” We recommend to deep a little about the four main types of
NLS: 1) Volterra series models, 2) block structured models, 3) neural network models, and 4)
NARMAX models;
“The figure that illustrate this pill shows 4 types of equilibrium you should distinguish in order to understand
Big Data better because one usual approach is to describe the solutions globally (via nullclines). What
happens around an equilibrium point remains a mystery so far. Here we propose then to discuss this
problem. The main idea is to approximate a nonlinear system by a linear one (around the equilibrium point).
Of course, we do hope that the behavior of the solutions of the linear system will be the same as the
nonlinear one. This is the case most of the time (not all the time!).”
Open Data (see the list of over 200 local, regional and national open data catalogues) is a global
movement and semantically a universal idea in formation. From its very beginning, at the dawn of
our civilization, data was closely related to any type of asset and with the nature of a “capital” by
itself (see Merton Thesis) and its “opening” will go head to head parallel to the development of our
societies. For this reason what really crucial is the Web contribution to this opening (see
Datacatalogs);
Finally PaaS, Platform as a Service, is one of the three pillars of Cloud Computing Services that
includes SaaS, Software as a Service, IaaS and Infrastructure as a Service. Users of this service may
create, control and set their own software under the contracted platform. Remember that PaaS
has not too much sense alone but within an almost enforced trilogy [PaaS, SaaS, IaaS];
KRIT algorithm, Monfort University (UK)
Strange Attractor Visualization, from Chaoscope
o New Big Science, https://www.simonsfoundation.org/quanta/20131002-a-digital-copy-of-the-
universe-encrypted/;
o Nonlinear system identification, system identification applied to nonlinear system (generally of high
complexity), http://en.wikipedia.org/wiki/Nonlinear_system_identification;
o Open Data AR, http://datospublicos.gob.ar/;
o Open Data UK, http://data.gov.uk/;
o Open Data US, http://www.data.gov/;
o Open Data, see also Open Data Initiative, http://en.wikipedia.org/wiki/Open_data, Open data is the
idea that certain data should be freely available to everyone to use and republish as they wish,
without restrictions from copyright, patents or other mechanisms of control;
o PaaS, Platform as a Service;
Semantic Pill 20
Tips: Pig and Pig Latin are a Platform and a language to create MapReduce programs within
Hadoop;
Q-Learning algorithm, it’s a “Reinforced learning” process, where an agent gather as much as
possible positive (rewarding) experience. The two figures below show how to face an AI, Artificial
Intelligence algorithm of Q-learning. The house has 6 “rooms”, five, from 0 to 4 inside the house
and sixth outside as an open room. The challenge is to acquire positive experience to train an agent
to get out of the house (Finish) beginning on room 2. We may “go” at random starting from “state”
2 and building a “table” or matrix of “rewards”: for example 0 if going “wrong” and 100 if going
“right”. We may extend this tiny and basic model to any degree of complexity, trying to find the
right way from any place within any labyrinth;
Quality of research data, and Quality of research, sometimes forgotten items. This paper deals with
a double polemic theme: quality and how research - in general terms- is really performed. The
analysis was extended to three disciplinary domains applied by the European Science Foundation:
Physical Sciences and Engineering, Social Sciences and Humanities, and Life Sciences. Finally Social
Sciences and Humanities and Life Sciences were summarized as one.
The attitude in Physical Sciences and Engineering would seem to be that quality control of data can best be
effectuated through citation of datasets and quality-related comments on those datasets which are made
available through Open Access data publications. No need is expressed for codes of conduct, training in data
management, or peer review of data that is published together with articles. In Life Sciences there is first and
foremost a need for a code of conduct for dealing with data. Training in data management fits in with this. A
direct judgment on quality can be given through peer review of the data that is published together with
articles and through quality-related comments, a derived judgment through data publications and citations.
Open access to data does not score highly. Interestingly enough, Life Sciences are ahead of the other
disciplines as regards open access to articles.
Warning: we have to take into account that quality of data is essential and a necessary condition
not only for Big Data but for everything: we behave based on data.
Finally RTDP has many meanings for example Real Time Data Platform and Real Time Dynamic
Programming and within this appears as related to the semantic under study in this pill the theme
“Learning to Act using Real Time Dynamic Programming” of which we reproduce here part of its
abstract:
Researchers have argued that DP provide an appropriate basis real time control as well as for learning when
the system under control is incompletely known. RTDP is DP based algorithm by which an embedded system
can improve its performance with experience. It is a generalization of Korf’s Learning Real Time - A*
algorithm to problems involving uncertainty;
Source: Q-learning from mnemstudio.org
o Pig language, a language to “mapreduce” and “hadoop” management,
http://en.wikipedia.org/wiki/Pig_(programming_tool);
o Q-learning algorithm, http://en.wikipedia.org/wiki/Q-learning, it’s a “Reinforced learning” process,
http://artint.info/html/ArtInt_265.html, see here how an agent gather positive (rewarding)
experience (see learning and reinforced learning);
o Quality of research data, http://www.dlib.org/dlib/january11/waaijers/01waaijers.html;
o Quality of research, a forgotten them: scarce and deficient data and lack of interest;
o Real Time DP, see RTDP;
o Reinforced learning, see Q-learning;
Semantic Pill 21
Tips: Remote Sensing: for example for Geospatial Analytics for Big Spatiotemporal Data:
Algorithms, Applications, and Challenges” it says:
We are living in the era of `Big Data.' Spatiotemporal data, whether captured through remote sensors (e.g.,
remote sensing imagery, Atmospheric Radiation Measurement (ARM) data) or large scale simulations (e.g.,
climate data) has always been `Big.' However, recent advances in instrumentation and computation making
the spatiotemporal data even bigger, putting several constraints on data analytics capabilities. In addition,
large-scale (spatiotemporal) data generated by social media outlets is proving to be highly useful in disaster
mapping and national security applications. Spatial computation needs to be transformed to meet the
challenges posed by the big spatiotemporal data.
The Big ones: the ESG, Earth System Grid could be considered one of the World Big Data scientific
portals
Sampling and particularly representative sampling is pure statistics and an all times universal
problem: how to wisely sample a universe to get the information we need. However Big Data
universes and situations may present new challenges specially when considering unstructured
data, relatively unknown, noisy, erratic and most times unpredictable like we may find in Social
Data: “The Pitfalls of using online and social data in Big Data analysis”:
In her draft paper, Big Data: Pitfalls, Methods and Concepts for an Emergent Field, UNC professor and
Princeton CITP fellow Zeynep Tufekci (@zeynep) compares the methodological challenges of developing
socially-based big data insights using Twitter to biological testing on Drosophila flies, better known as fruit
flies. Drosophila flies are usually chosen because they’re relatively easy to use in lab settings, easy to breed,
have rapid and “stereotypical” life cycles, and the adults are pretty small. The problem? They’re not
necessarily representative of non-lab (read: real-life) scenarios. Tufekci posits that the dominance of Twitter
as the “model organism” for social media in big data analyses similarly skews analysis.
Sampling was, is and will be fundamental. Now within the “Big Data” move we have to be more
careful than “before” (one year from now!) concerning this problem, The figure below depicts 4
ways of a “zonal sampling” each one coherent but with 4 probable different outcomes;
Roadway Traffic Control is an old Big Data experience and now there is a proliferation of integral
solutions, fundamentally to avoid congestions and whether possible keep the circulating
community communicated: see the T-system, Big Data in Traffic and Big Data in the Automotive
Industry:
As you know, cars can’t speak. If they could, they would be able to provide a wealth of information that
would be invaluable to drivers, repair shops and automakers alike. To gain access to this data – and help the
car talk – more and more vehicles are being fitted with sensors and connectivity solutions. According to a
study by management consultants Oliver Wyman, 80 percent of all autos sold in 2016 will be connected. That
would equate to approximately 210 million talking cars cruising round our streets. Compared to 45 million
autos in 2011, that is a projected annual growth rate of over 36 percent. Connected cars could provide a
steady stream of data on vehicle movements, condition, wear and tear of parts, and ambient conditions.
Extracting meaning from this mass of mixed data is no easy task. The challenge is transmitting the
information, analyzing it and redistributing it to the relevant recipients – all at high speed. It is a challenge
that T-Systems can master.
SaaS, stands for Software as a Service, has meaning as a stand alone Software Delivery Model or
as forming part of the Cloud Computing trilogy [SaaS, IaaS, PaaS];
Source: Habitat Maps for the EU (MESH)
o Remote Sensing, the era of big geospatial data,
http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/raju-bigspatial.pdf;
o Representative samples, classical: http://en.wikipedia.org/wiki/Sampling_(statistics), cares to be
taken when considering Social Data: http://sloanreview.mit.edu/article/the-pitfalls-of-using-online-
and-social-data-in-big-data-analysis/;
o Roadway traffic control, see https://www.thalesgroup.com/en/worldwide/transportation/road-
traffic-management;
o RTDP, Real Time Dynamic Programming;
o SaaS, Software as a Service;
Semantic Pill 22
Tips: SBA that stands for Search Based Applications, make reference a “semantic Applications”
where the core of the architecture rests on a “semantic search engine”. Data should be - ideally -
fully structured from a semantic point of view: however it is possible to build via Artificial
Intelligence a sort of “Semantic Glasses” to see the whole Web or part/region of it as semantically
structured. See Darwin Semantic Glasses in the e-book “Semantic Web”.
SDSS, Sloan Digital Sky Survey, is a project for mapping the universe has obtained deep, multi-color images
covering more than a quarter of the sky and created 3-dimensional maps containing more than 930,000
galaxies and more than 120,000 quasars. You may click on the figure below (in the Website) and you will go
to an enlarged vision of it with the Earth in the center and a point representing a galaxy typically containing
100 billion stars each!
SML, Social Media Listening, Social Media Monitoring and Social Media Measurement, deals with
paying attention to what “people” say about anything, fundamentally having in mind a Web
Ontology that “sees” the Web as a dual continuous interacting model: the “Established Order” (the
law, governments and governors, all type of institutions, authorities, teachers, masters and who
teach, fabricants and sellers of products and services,…) by one side and the people as the other
(governed, users, who learn, students, buyers, solicitants, ….). See SMM Social Media Monitoring
from Huffington Post.com that textually affirms that “it pays to listen”. I prefer to think of it as a
technology solution to what my mother told me when monitoring my growing up: God gave you
two ears and one mouth for a reason;
Galaxy Mapping, from Sdss.org
o SBA, Search Based Application;
o SDSS, the world project to map the Universe, http://www.sdss.org/;
o Search Based Applications, see SBA;
o Sloan Digital Sky Survey, see SDSS;
o SML, Social Media Listening, a neologism by SMM, Social Media Monitoring;
o SMM, http://www.huffingtonpost.com/robert-ball/social-media-monitoring-i_b_833702.html, it
pays for listening.
Semantic Pill 23
Tips: Social Genome is a new term coined in relation to a Wal-Mart/Facebook project:
Huffingtonpost.com says to this respect:
WalmartLabs defines the "Social Genome" as "a giant knowledge base that captures entities and
relationships of the social world." Wal-Mart has spent the last few years building this in-house Social
Genome, part public data, part private data, "and a lot of social media." Tweets, Facebook messages, blog
posts, You Tubes, its all streaming into Wal-Mart. Streaming in so fast, that WalmartLabs created something
they call Muppet, a solution for processing Fast Data, using large clusters of machines. The Labs describes
the Social Genome as their "crown jewel."
Social Media Listening has been extensively dealt in previous pill. We aggregate that it will soon
become an art and a science. In the figure below the necessary “feedback” loop is missing at any
step threading a sort of feedback embedding and is also missing something like a wise guiding
invisible hand well acquainted about how we humans document our opinions and messages,
perhaps a Human Documentation Ontology;
SSD, only related to Big Data, for example Kingston Introduces New Enterprise SSD to Support Big
Data and Virtualization Initiatives (up to 480 GB);
Source: SML from anchormedia.com
o Social Genome, related to Wal-Mart social genome a Wal-Mart/facebook project related
to BD;
o Social listening, see SML;
o Social media listening;
o Social Media Monitoring, see SMM;
o solid state drive, see SSD;
o SSD, Solid State Drive;
Semantic Pill 24
Tips: Strategic Planning is an old and always alive concept: we may see what the classic Harvard
Business Review says about this concept related to our actual Big Data: The Management
Revolution (Oct 2012):
Business executives sometimes ask us, “Isn’t ‘big data’ just another way of saying ‘analytics’?” It’s true that
they’re related: The big data movement, like analytics before it, seeks to glean intelligence from data and
translate that into business advantage. However, there are three key differences, fundamentally “Volume”
and “Velocity” (2 out of the 3V);
System Identification related concepts has been presented in previous pills however we recommend
to see Perspectives on System Identification, by Lennar Ljung, a Citseer paper. Its abstract says:
System identification is the art and science of building mathematical models of dynamic systems from
observed input-output data. It can be seen as the interface between the real world of applications and the
mathematical world of control theory and model abstractions.
We recommend to deep on its methodology core as follows:
1) estimate globally the model (m), 2) a True Description of the model (S), 3) the model class of pertinence of
the model (M), 4) The Complexity (C) of the model class, 5) all the Information available about the object to
be modeled: observed data and everything that aid to describe it, 6) Validation, that involves generalization:
validation data sets(Z), 7) Model Fit (F) that explains how well our model (m) adapts to a (Z) dataset: F(m, Z);
By Technology Forecasting we refer to something really new because up to now we humans were
used to generate “futuribili”, visions of the future, without even imaging about the technology to
make it possible. Up to now there was a general belief: once we humans are convinced that
something is possible, no matter efforts and resources needed, we start “ex post” to think about
the “how to”. Let’s see then techniques to guide our imagination to those “how to”: Delphi Method
one of the more used belong to learning by Q&A rounds among experts model, Forecast by
Analogy as a form of reinforcing suppositions based on credible analogies and all type of
projections obtained by different extrapolation criteria (for example based on Growth Curves);
Tensors (and Matrices) are something that you should know in some extent and this (52 pages PDF
document) could be a good introduction to Knowledge Discovery and Data Mining tools and
technique that you are going to need in Big Data. Tensors and Tensor Calculus are essential in
some disciplines like all referred to quantum: Quantum Mechanics and Quantum Computing. The
figure below depicts a tensor visualization of the Cauchy Stress Tensor:
Tensors are geometric objects that describe linear relations between vectors, scalars, and other tensors.
Elementary examples of such relations include the dot product, the cross product, and linear maps. Vectors
and scalars themselves are also tensors. A tensor can be represented as a multi-dimensional array of
numerical values. The order (also degree) of a tensor is the dimensionality of the array needed to represent
it, or equivalently, the number of indices needed to label a component of that array. For example, a linear
map can be represented by a matrix, a 2-dimensional array, and therefore is a 2nd-order tensor. A vector can
be represented as a 1-dimensional array and is a 1st-order tensor. Scalars are single numbers and are thus
0th-order tensors.
Tensor Toolbox: see Matlab Tensor Box in previous pills;
Tensor as of Wikipedia
o Strategic Planning CP, (Control Panel);
o System identification, old system concept, building of a model of it via its behavior study;
o Technology forecasting;
o Tensor, its association to Big Data and Data Mining looks like a revival of geometry and
math, http://users.cs.fiu.edu/~taoli/kdd09-workshop/DMMT09-proceedings.pdf;
o Tensor toolbox,
Semantic Pill 25
Tips: TDA, Topological Data Analysis is something that should be carefully studied at least
conceptually if you are not strong on math as it is fundamental in Data Mining, Visualization,
Semantics and now embedded in Big Data:
The main problems are: 1. how one infers high-dimensional structure from low-dimensional
representations; and 2. how one assembles discrete points into global structure. The human brain
can easily extract global structure from representations in a strictly lower dimension, i.e. we infer
a 3D environment from a 2D image from each eye. The inference of global structure also occurs
when converting discrete data into continuous images, e.g. dot-matrix printers and televisions
communicate images via arrays of discrete points. The main method used by topological data
analysis consist of three steps: a. Replace a set of data points with a family of simplicial complexes,
indexed by a proximity parameter; b) Analyze these topological complexes via algebraic topology
— specifically, via the theory of persistent homology, c) Encode the persistent homology of a data
set in the form of a parameterized version of a Betti number which is called a barcode.
New concepts to add to this Mini Thesaurus:
Simplicial Complexes;
Persistent homology;
Betti number;
Barcode;
“To see more and better”, this term as exact has 1,340,000 references in Google appearing like a
“meme” or goal of research and innovation. It is also the “motto” of our Darwin Methodology: to
build tools to “see more and better” the Web, like the Darwin Semantic Glasses.
Tuples: A tuple is an ordered list of elements and Tuple Space a space of tuples to be used
sometime, somewhere and somehow:
A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing.
It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that
there are a group of processors that produce pieces of data and a group of processors that use the data.
Producers post their data as tuples in the space, and the consumers then retrieve data from the space that
match a certain pattern.
Vector Processing refers to process by vectors instead of processing by single data or “scalar” one
at a time. This technique could be used not only as a possible architecture to build supercomputers
bust also as program. Our Darwin Methodology process “by textons” resembling vectors of Web
documents.
Watkins Q-Learning Algorithm, points to the Watkins thesis (1989): Learning from Delayed
Rewards, a liminal work of 220 pages. The thesis faces a crucial query behavioral scientists make to
themselves: how might the animals learn optimal policies from their experience? and going a little
deeper: is it possible to give a systematic analysis of possible computational methods of learning
efficient behavior?
Weather Forecast has been reviewed in previous pills however we suggest to read Big Data
Reshapes Weather Channel Predictions, an article about The Weather Company from
InformationWeek.com:
"Weather is the original big data application," says Bryson Koehler, executive VP and CIO at the Weather
Company. "When mainframes first came about, one of the first applications was a weather forecasting
model."
Flash forward to today and the Weather Company ingests some 20 terabytes of data per day to spin out what
Keohler bills as the world's most accurate forecasts. To stay ahead of its competition, the Weather Company
is in the process of rolling out a new platform built on Basho's Riak NoSQL database and running globally in
the Amazon Web Services (AWS) cloud.
Source: DARPA Topological Data Analysis, from Big Data, Wikipedia
o TDA, see DARPA;
o Topological Data Analysis, see TDA;
o To see more and better, G: 1,340,000 as exact term; it looks like a universal R&D goal;
o Tuple space*, a form of associative memory;
o Tuple space, http://en.wikipedia.org/wiki/Tuple_space;
o Vector Processing, see “Darwin textons”;
o Wal-Mart, http://www.bigdata-startups.com/BigData-startup/walmart-making-big-data-
part-dna/, it’s a pioneer in BD, see social genome;
o Watkins Q-learning algorithm, a specific Q-learning algorithm;
o Weather forecasts;
Semantic Pill 26
From Gartners Newsroom
2012 (DEC 2011)
Through 2015, more than 85 percent of Fortune 500 organizations will fail to effectively exploit
big data for competitive advantage.
Current trends in smart devices and growing Internet connectivity are creating significant increases
in the volume of data available, but the complexity, variety and velocity with which it is delivered
combine to amplify the problem substantially beyond the simple issues of volume implied by the
popular term "big data." Collecting and analyzing the data is not enough — it must be presented in
a timely fashion so that decisions are made as a direct consequence that have a material impact on
the productivity, profitability or efficiency of the organization. Most organizations are ill prepared
to address both the technical and management challenges posed by big data; as a direct result,
few will be able to effectively exploit this trend for competitive advantage.
2013 See 10 Strategic Technology Trends for 2013
2014 (NOV/DEC 2013)
Predicts 2014: Apps, Personal Cloud and Data Analytics Will Drive New Consumer
Interactions
22 November 2013
Mobile apps have become the official channel to drive content and services to consumers.
Using big data collated via apps can drastically improve value to consumers. Businesses
that develop data tracking and analytics will improve delivery to customers, increasing
customer loyalty and acquisition.
Predicts 2014: Big Data
20 November 2013
Gartner's 2014 predictions explore how the developing maturity and awareness of big data
impacts analytics, resources, data center infrastructure and consumer privacy. Enterprises
must adapt to this quickly changing landscape to establish an analytical competitive
advantage.
Predicts 2014: Cloud Computing Affects All Aspects of IT
4 December 2013
Gartner's 2014 cloud computing predictions shed light on the evolution of the concept as it
continues its path toward becoming more and more integral to IT. IT organizations will
need to monitor developments in order to adapt their cloud strategies to the realities of
tomorrow.
Infographics from IBM Big Data Hub
Concerning Data Size Matters
=> Back
Darwin Methodology - HKM Demo
Darwin: It stands for Distributed Agents to Retrieve the Web Intelligence
Synthesis for Partners
As of October 9
th
2014
“In Mind Idea”, personal, unique and “invisible”
Human Knowledge Map, HKM, in numbers
Knowledge Wood:
It depicts a “wood” of about 200 Branches (a wood of “HK Trees”), for all pairs (culture, language).
It opens as:
o A Human Encyclopedia of about 600,000 MS, Major Subjects <=> Major Concepts <=>
~3,000 MS per Branch of a HK Tree
o An “In Mind Ideas” Universe of about: 20,000,000 CS, Common Subjects (topics?) <=>
minor, common concepts <=> ~35 CS per MS
o As of today justified by a World Data Reservoir of about 40,000,000,000 Web pages
Knowledge Sample:
(1.2% of the whole Web)
Branch: The Art
Pair culture language: (British, American English)
Major Subjects: 7,571
Concepts: ~400,000 CS (common concepts about The Art)
Web Universe of sample: ~100,000,000 pages
First HKM Version:
Effort: ~300 men-months (scientific, academic and professional top levels)
Time: 6 months (“alpha test”); 3 months more than for its “beta test”;
Languages: English and Spanish
Forms of Knowing
o Q&A: By querying sources of information, knowledge and
wisdom
History: Humans and God Oracles; Shamans and gurus; Ancient &
Wise people; Libraries; Temples; Spirit workings; Spiritual
Orientations;
o Experiencing: for a living, for fun, to survive, work and all type of
Working Activities; Gaming; Entertainments; Arts & Crafts;
Adventures and risky activities; Innovations;
o Studying: fundamentally the “Established Knowledge”
Being active part of “The Education” by exercising the pair
“Teaching - Learning” one at a time or both concurrently
(depending of cultures); master disciple relationships;
Darwin performs all three because it works on a Web Thesaurus that “sees” the Web “more and
better” as if it were semantically indexed with ideal metadata approaching to the “best
probabilistic truth” at any moment.
The “anthropic” component of Darwin: the HK seeds:
12 ways of drafting semantic seeds example
Retrieving from “zero ground”: Darwin Methodology may retrieve any MS, Major Subject from
the Web, provided the statistic significance of the Web Universe that deals with it, from “zero
ground” in terms of knowledge. However it could be costly in terms of trials.
Seeds: Darwin agents (robots) explore the Web recursively starting from a seed, a sort of primal
and basic tree where are only depicted “suspected” SMS, Super Major Subjects of the discipline to
retrieve that when the whole tree is retrieved probably become part of the upper levels of it.
These seeds are provided by human experts. These seeds are made grown via a Semantic
Ontology (Darwin Ontology) and continuously checking of its axioms is performed. For each seed
Darwin computes its semantic consistence and suggest humans changes in content and form, for
instance changes within the neighborhood of certain critical “knowledge nodes”.
The Web may hide unexpected “best truths”: Along years 2009 - 2010 Darwin was used to
retrieve a relatively heavy and complex discipline: The Art. As at that time as we did not have the
possibility of creating a trustable and good enough seed we started from “zero ground” equals
“total ignorance”. After trying four seeds we arrived to a rather atypical and unexpected result:
Three SMS, Super Major Subjects, appeared, namely: Culinary Art, Physical Arts and almost half of
the Art Infrastructure (see Vision 2). We discussed this Darwin agent’s “discovery” with Art Experts
and they agreed: what Darwin found was perhaps less “selected” that the classic idea of “The Art”
experts perhaps have within their minds but evidently it was more ample, popular and closer to
the modal and statistical truth.
Concepts versus keywords:
Partial Visualization of The Art Tree skeleton
Darwin works with concepts: Darwin may query by concepts: The figure above depicts a partial
and in some extent reductionist visualization of four minor subjects of The Art: Rigoletto, Light
Lyric, Fantasy novel and Paella. All these are considered “keywords”; Concepts, on the contrary,
are unique representations of “in mind ideas” that could (in fact should…) be instantiated and they
are semantically and logically defined as a chain of links; For instance the concept Rigoletto here
points precisely to the Opera Rigoletto of Verdi coded as [0.1.2.2.2.2.14.1.6.10.5, Rigoletto], that
is, embedded as end link of a “tree track”. In Darwin Rigoletto is seen as a CON-CEPT where the
track [0.1.2.2.2.2.14.1.6.10.5] defines its CONtext and the keyword Rigoletto by itself a CEPT, a
given name within this CONtext.
Why Darwin Methodology succeed to see the Web more and better?
i) Because it works within semantic strong HCI, Human Computer Interaction scenarios
ii) Because it follows the best Golden Age after World War II Information Technology utopias
(1940-1965 period where they flourished, coincident with the “baby boomer” generation).
Claude Elwood Shannon - Labyrinth
Darwin Ontology: along Claude Shannon findings states that Human Documents are generally
performed combining intelligently two and only two types of semantic particles: 1) Common
Words and Expressions and 2) Concepts. Darwin Ontology paves the way to understand the
sequence:
data => information => knowledge => wisdom
I N T E L L I G E N C E
Math and psycho thinking have much to do: within the Web universe and concerning “man-
machine” interrelation mathematicians, engineers, psychologists, epistemologists, and
psychologist have much to say. As an example:
o Miller’s Law, psychologist, 1929 - 2012;
o Fitts’Law, psychologist, 1912 -1965;
o Hick-Hyman Law, psychologists, 1912 - 1974;
o Power Law of Practice, Newell and Rosenblom Law, psychologists, 1927 - 1992;
o Pareto, engineer, economist and Zipf’ Law, linguist, 1902 - 1950;
Two rare parallel lives:
o Zipf George Kinsley, USA, linguist, statistics, Harvard, 1902 - 1950, Pn ~1/na
;
o Fitts Paul, USA, psychologist, ergonomist, UA Air Force, 1912 - 1965; T ~a+b.log(1+D/W)
And the best of the best utopias:
o Shannon Elwood Claude, USA, 1916 - 2001 mathematician, engineer, MIT, Bell Lab, Nobel
Prize: Information Theory, H(X)~-p(x).log p(x);
o Von Neumann John, “father” of Computers as we know now, Hungarian/American, 1903 -
1957, mathematician, physicist;
o Turing Alan, 1912 - 1957, British, mathematician, philosopher, Turing Machines, ENIGMA;
=> Back
Wikipedia Semantic Skeleton as per Wikipedia
One of the best avatars as per Darwin Ontology
Read some reflections in green
 1 History
 2 Openness
o 2.1 Restrictions
o 2.2 Review of changes
o 2.3 Vandalism
 3 Policies and laws
o 3.1 Content policies and guidelines
 4 Governance
o 4.1 Administrators
o 4.2 Dispute resolution
 5 Community
o 5.1 Diversity
 6 Language editions
 7 Critical reception
o 7.1 Accuracy of content
o 7.2 Quality of writing
o 7.3 Coverage of topics and systemic bias
o 7.4 Explicit content
o 7.5 Privacy
o 7.6 Sexism
 8 Operation
o 8.1 Wikimedia Foundation and the Wikimedia chapters
o 8.2 Software operations and support
o 8.3 Automated editing
o 8.4 Wikiprojects, and assessments of articles' importance and quality
o 8.5 Hardware operations and support
o 8.6 Internal research and operational development
o 8.7 Internal news publications
 9 Access to content
o 9.1 Content licensing
o 9.2 Methods of access
 10 Impact
o 10.1 Readership
o 10.2 Cultural significance
o 10.3 Sister projects – Wikimedia
o 10.4 Publishing
o 10.5 Scientific use
 11 Related projects
 12 See also
 13 References
o 13.1 Notes
 14 Further reading
o 14.1 Academic studies
o 14.2 Books
o 14.3 Book reviews and other articles
 15 External links
Wikipedia in brief
Jimmy Wales and Larry Sanger launched Wikipedia on January 15, 2001. Sanger[9]
coined
its name,[10]
a portmanteau of wiki[notes 3]
and encyclopedia. Initially only in English,
Wikipedia quickly became multilingual as it developed similar versions in other languages,
which differ in content and in editing practices. The English Wikipedia is now one of 291
Wikipedia editions and is the largest with 5,071,371 articles (having reached 5,000,000
articles in November 2015). There is a grand total, including all Wikipedias, of over 38
million articles in over 250 different languages.[12]
As of February 2014, it had 18 billion
page views and nearly 500 million unique visitors each month.[13]
A peer review of 42 science articles found in both Encyclopædia Britannica and Wikipedia
was published in Nature in 2005, and found that Wikipedia's level of accuracy approached
Encyclopedia Britannica's.[14]
Criticisms of Wikipedia include claims that it exhibits
systemic bias, presents a mixture of "truths, half truths, and some falsehoods",[15]
and that
in controversial topics it is subject to manipulation and spin.[16]
Subjects covered (topics)
https://en.wikipedia.org/wiki/Portal:Contents/Lists
Wikipedia's contents: Lists
General reference
Culture and the arts
Geography and places
Health and fitness
History and events
Mathematics and logic
Natural and physical sciences
People and self
Philosophy and thinking
Religion and belief systems
Society and social sciences
Technology and applied
sciences
Some reflections follow
o At large Wikipedia drives you to only one of ITS articles. For example for this
subject: List of Gold Glove Award winners at pitcher, it drives you to this link
https://en.wikipedia.org/wiki/List_of_Gold_Glove_Award_winners_at_pitcher If
you ask now to Google about this subject as an open search it renders 610,000
references and if questioned as closed (between quotation marks) it renders 103!
o WARNING: Many articles like those related to directories and glossaries indexes
are only cosmetic copies of existent Web articles.
o See as example of something reasonably well written about Zen (259.000.000 as
per Google) but misleading: https://es.wikipedia.org/wiki/Zen unfortunately it
drives people to a “marketing” biased vision of it!
However as per Darwin Vision Wikipedia is one of the best known avatars about Human
Knowledge but at large a “classic subjective vision”, a rational synthesis issued by a group
of people, many of them “authorities” besides, at a comparable level of quality of the
mentioned Encyclopaedia Britannica and Nature. Darwin Vision on the contrary intents to
take into account absolutely ALL, EVERYTHING and EVERYBODY: past and present things,
actors, contexts and variables related to the avatar under consideration. By intent we
mean a demonstrated mind opening to see the ALL.
Note: This ANWOT spirit (A New Way of Thinking) intents to go ahead of the methodic
doubt of Cartesian philosophy. As an example of ANWOT open mind see below the basic
curiosity spirit that should guide searching efforts: what’s then behind ANWOT?
EN, "a new way of thinking", 4.280.000
ES, "una nueva forma de pensar", 337.000
IT, "un nuovo modo di pensare", 109.000
FR, "une nouvelle façon de penser", 117.000
DE, "eine neue Art des Denkens", 11.300
PT, "um novo modo de pensar", 82.400
CN, “一種新的思維方式, 40.000
IL, , "‫דרך‬ ‫חדשה‬ ‫של‬ ‫,"חשיבה‬ 6,230
JP, 新しい考え方, 1.350.000
NL, "een nieuwe manier van denken", 31.200
IN, "सोच का एकनया तरीका है ", 255
RU, "новый способ мышления", 20.000
TR, "yeni bir düşünce yolu", 854
Cat, "una nova forma de pensar", 13.700
Gall, "un novo modo de pensar", 553
Eusk, "pentsatzeko modu berri bat", 172
Esperanto, "nova pensmaniero", 230
Árabe, "‫قة‬ ‫طري‬ ‫دة‬ ‫جدي‬ ‫ي‬ ‫ف‬ ‫ير‬ ‫ك‬ ‫ف‬ ‫ت‬ ‫,"ال‬ 9.430
=> Back
Word Searching Weakness
By Juan Chamero, from Buenos Aires, as of November 11th
2014
Physics versus Semantic analogies: We may see the matter of the universe as constituted by
either a) molecules building blocks or b) molecules o c) atoms or d) hadrons (protons and
neutrons) or quarks. Of course if we choose to see the whole world as formed of molecules
building blocks it would be practically impossible to see the world correctly at atoms level and
worst at hadrons level so imagine what would happen at quarks level!. Something similar occurs
within the Web space: something documented via concepts cannot be seen correctly via words.
For instance the noun “dog” could be adequately inserted via Wordnet within a given hierarchy:
dog, domestic dog, Canis familiaris
=> canine, canid
=> carnivore
=> placental, placental mammal, eutherian, eutherian mammal
=> mammal
=> vertebrate, craniate
=> chordate
=> animal, animate being, beast, brute, creature, fauna
=> ...
But always within “words” level in this example the word dog within the animal realm but nothing
to do within hierarchies of knowledge (concerning knowledge Wordnet behave as “flat”,
unstructured). You may imagine the word dog as an atom building component of many different
molecules and these molecules semantically structured as pieces of knowledge belonging to
different branches of knowledge: dog within canine within zoology; dog within “customs dog”;
“trained dogs” for many applications; “dog psychology” for veterinary applications and for pet’s
care; dog and red dog meanings in engineering; “lazy dog” and “lazy dog breeding” within human
pet preferences; “dog entertainment ideas”……; “cattle dog”, “cattle dog breeding”, “Australian
cattle dog”, “herding dog” and thousands more!. See the Darwin Semantic Hypercube.
=> Back
The Differences between Data, Information and Knowledge
As per Infogineering.net
Darwin comments in red, by Juan Chamero, from Buenos Aires, as of November 2014
Knowledge
Firstly, let’s look at Knowledge. Knowledge is what we know. Think of this as the map of
the World we build inside our brains. Like a physical map, it helps us know where things
are – but it contains more than that. It also contains our beliefs and expectations. “If I do
this, I will probably get that.” Crucially, the brain links all these things together into a giant
network of ideas, memories, predictions, beliefs, etc. Not bad!
It is from this “map” that we base our decisions, not the real world itself. Our brains
constantly update this map from the signals coming through our eyes, ears, nose, mouth and
skin. And coming through our viscera also!
You can’t currently store knowledge in anything other than a brain, because a brain
connects it all together. Everything is inter-connected in the brain. Computers are not
artificial brains. They don’t understand what they are processing, and can’t make
independent decisions based upon what you tell them. I would dare to say not yet!
There are two sources that the brain uses to build this knowledge - information and data.
Data
Data is/are the facts of the World. For example, take yourself. You may be 5ft tall, have
brown hair and blue eyes. All of this is “data”. You have brown hair whether this is written
down somewhere or not. OK, it is a fact (see fact definition below) but the explanation is
incorrect!
fact
noun
noun: fact; plural noun: facts
1. a thing that is known or proved to be true.
"the most commonly known fact about hedgehogs is that they have fleas"
synonyms:
reality, actuality, certainty, factuality, certitude; More
truth, naked truth, verity, gospel
"it is a fact that the water supply is seriously polluted"
antonyms: lie, fiction
o information used as evidence or as part of a report or news article.
"even the most inventive journalism peters out without facts, and in this case there were no facts"
synonyms:
detail, piece of information, particular, item, specific, element, point,
factor, feature, characteristic, respect, ingredient, attribute,
circumstance, consideration, aspect, facet; More
information, itemized information, whole story;
informalinfo, gen, low-down, score, dope
"every fact in the report was double-checked"
o used to refer to a particular situation under discussion.
noun: the fact that
"despite the fact that I'm so tired, sleep is elusive"
In many ways, data can be thought of as a description of the World. We can perceive this
data with our senses, and then the brain can process this. Data would have the same nature
of knowledge, however not enough to be considered knowledge. We prefer to define it as
“a piece of knowledge” instead. What really happens is that knowledge, accepted as “things
we know” would be a thing hierarchically structured: low level knowledge would be data
for a superior cognitive level.
Human beings have used data as long as we’ve existed to form knowledge of the world.
That’s correct!
Until we started using information, all we could use was data directly. If you wanted to
know how tall I was, you would have to come and look at me. Our knowledge was limited
by our direct experiences. What was seen up to here is enough to go thru life and make our
knowledge increase but not enough to build an ontology that help us to “see more and
better” superior levels of it. It reminds us the sagas about the cosmovision differences
between Newton’s and Einstein’s.
Information
Information allows us to expand our knowledge beyond the range of our senses. Not bad!
We can capture data in information, then move it about so that other people can access it
at different times.
Here is a simple analogy for you.
If I take a picture of you, the photograph is information. But what you look like is data.
That’s incorrect! Information are “pieces of knowledge” that improve our level of
appreciation (diminishing our uncertainty) of the real world, as per Shannon Information
Theory!.
I can move the photo of you around, send it to other people via e-mail etc. However, I’m
not actually moving you around – or what you look like. I’m simply allowing other people
who can’t directly see you from where they are to know what you look like. If I lose or
destroy the photo, this doesn’t change how you look.
So, in the case of the lost tax records, the CDs were information. The information was
lost, but the data wasn’t. Mrs Jones still lives at 14 Whitewater road, and she was still
born on 15th August 1971.
The Infogineering Model (below) explains how these interact…
As Einstein Shannon discovery was too advanced for its time; Information is not easy to
understand, it is something that help intelligent beings to take decisions, in order to survive,
to improve, to solve a problem, to avoid obstacles: ideally is a “warning signal” (a sort of
biiiiip…) accompanied by a message (minimally a “bit”, a zero or a one).
=> Back
The Web for fun
Darwin Semantic Pill unveiling example
Who’s on first: Polisemy versus Monosemy
From Buenos Aires as of April 9th
18:25 2015
Agent-Human: juan chamero, Time elapsed: 20 minutes
o Who’s on first: example of a famous EN American comic routine (worldwide: vaudeville),
http://en.wikipedia.org/wiki/Who%27s_on_First%3F
o ES Spanish cultural examples, http://www.retoricas.com/2011/07/ejemplos-de-
equivoco.html
o semiotics, http://en.wikipedia.org/wiki/Semiotics
o polisemy, http://grammar.about.com/od/pq/g/polysemyterm.htm
o Abbott-Costello routine: http://www.psu.edu/dept/inart10_110/inart10/whos.html,
According to some estimates, more than 40% of English words have more than one
meaning. The fact that so many words (or lexemes) are polysemous "shows that semantic
changes often add meanings to the language without subtracting any" (M. Lynne
Murphy, Lexical Meaning, 2010).
Abbot Costello misunderstandings routine
(In Spanish “rutina de equívocos”)
Abbott: Strange as it may seem, they give ball players nowadays very peculiar names.
Costello: Funny names?
Abbott: Nicknames, nicknames. Now, on the St. Louis team we have Who's on first, What's on second, I Don't Know is on
third--
Costello: That's what I want to find out. I want you to tell me the names of the fellows on the St. Louis team.
Abbott: I'm telling you. Who's on first, What's on second, I Don't Know is on third--
Costello: You know the fellows' names?
Abbott: Yes.
Costello: Well, then who's playing first?
Abbott: Yes.
Costello: I mean the fellow's name on first base.
Abbott: Who.
Costello: The fellow playin' first base.
Abbott: Who.
Costello: The guy on first base.
Abbott: Who is on first.
Costello: Well, what are you askin' me for?
Abbott: I'm not asking you--I'm telling you. Who is on first.
Costello: I'm asking you--who's on first?
Abbott: That's the man's name.
Costello: That's who's name?
Abbott: Yes.
~ ~ ~ ~ ~
Costello: When you pay off the first baseman every month, who gets the money?
Abbott: Every dollar of it. And why not, the man's entitled to it.
Costello: Who is?
Abbott: Yes.
Costello: So who gets it?
Abbott: Why shouldn't he? Sometimes his wife comes down and collects it.
Costello: Who's wife?
Abbott: Yes. After all, the man earns it.
Costello: Who does?
Abbott: Absolutely.
Costello: Well, all I'm trying to find out is what's the guy's name on first base?
Abbott: Oh, no, no. What is on second base.
Costello: I'm not asking you who's on second.
Abbott: Who's on first!
~ ~ ~ ~ ~
Costello: St. Louis has a good outfield?
Abbott: Oh, absolutely.
Costello: The left fielder's name?
Abbott: Why.
Costello: I don't know, I just thought I'd ask.
Abbott: Well, I just thought I'd tell you.
Costello: Then tell me who's playing left field?
Abbott: Who's playing first.
Costello: Stay out of the infield! The left fielder's name?
Abbott: Why.
Costello: Because.
Abbott: Oh, he's center field.
Costello: Wait a minute. You got a pitcher on this team?
Abbott: Wouldn't this be a fine team w i t h o u t a pitcher?
Costello: Tell me the pitcher's name.
Abbott: Tomorrow.
~ ~ ~ ~ ~
Costello: Now, when the guy at bat bunts the ball--me being a good catcher--I want to throw the guy out at first base, so I
pick up the ball and throw it to who?
Abbott: Now, that's he first thing you've said right.
Costello: I DON'T EVEN KNOW WHAT I'M TALKING ABOUT!
Abbott: Don't get excited. Take it easy.
Costello: I throw the ball to first base, whoever it is grabs the ball, so the guy runs to second. Who picks up the ball and
throws it to what. What throws it to I don't know. I don't know throws it back to tomorrow--a triple play.
Abbott: Yeah, it could be.
Costello: Another guy gets up and it's a long ball to center.
Abbott: Because.
Costello: Why? I don't know. And I don't care.
Abbott: What was that?
Costello: I said, I DON'T CARE!
Abbott: Oh, that's our shortstop!
=> Back
Darwin Q&A
By Juan Chamero, from Buenos Aires, as of November 11th
2014
Some Web Experts Crucial Questioning
1. Differentiators re: Darwin vs. Google (or other similar search engines)
2. 3-5 use cases for the Darwin tools/ platform (not just for unveiling the web structure
– more specific to a business or function, or something a government agency could
do with it)
3. Any brief discourse you have on the ability of Darwin to work inside and discover
items in the Dark (Hidden) Web.
My answer starts here
Juan Chamero
Note 01: What a hard and deep questioning! Notwithstanding I will try to answer talking to myself. First of
all risking being redundant the essence of Darwin methodology has much of Zen (I’m a Zen master), in our
Western culture meaning open mind, maximum awareness to everything hidden or unhidden, taking into
consideration the whole past and all possible hypothesis about future but fully aware and living the present.
Darwin was conceived like a mental tool to “see” things as they are just NOW, without time (perhaps a time
that could neither be kept nor added). This essence is very important because it works like Meta Ontology
behind all Darwin unveiling tasks.
Darwin put emphasis in unveiling “global and/or massive trends”, looking for the best
truths/beliefs about them instead of inspecting individual behaviors within them. What does it
mean in terms of IT and computing? Darwin agents will inspect individuals anonymously almost
without even registering their data but paying special attention to signs of “suspected” traits.
Note 02: Here I make a first stop: many times in the beginning of Darwin applications technical people ask us
about coincidences with some other unveiling tools like f.i. “data mining”. Data mining backs up on
probabilities as Darwin does but without “suspecting”. On the contrary Darwin is a “causal” methodology
with a model of suspected behavior within a context of agents and scripts.
Now “You the Web expert” may question my note above saying: let me know Juan why are you
prejudging individual behaviors with suspects? You and I are right in some extent: in the beginning
of a Darwin exploration there are not real suspects, at most only some “seeds of suspected
behaviors”. As the long exploration proceeds, thousand of Web pages are explored for each
suspected item, even starting at random without those mentioned seeds, in order to be
scientifically framed, of a sudden appear persistent irregularities, atypical minor behaviors that
once detected by Darwin agents and/or approved by the human that operates Darwin become the
base of an autonomous and auto learning suspected behaviors database.
As Darwin is a semantic methodology those irregularities are somehow authenticated via semantic
coherence. In our glossary the “creatures” to be unveiled are “concepts”, “in mind ideas” shared
by millions of people pertaining to a given pair culture - language.
As explained in our presentations, papers and e-books for the pair English - American we estimate
circa 20 million different and unique “in mind ideas”. In order to use Darwin ideally for any
application we should have at hand a map of all them, at present a typical chicken and egg
problem, so Darwin is enforced to build on the run provisional, circumstantial thesauruses.
Questioning 3: Dark (Hidden) Web
First Drill: Let’s suppose the challenge be to unveil a criminal global and massive behavior starting
at “zero ground” in terms of information and knowledge. The subject to unveil (via Google search
with and without quotation marks) is related to a “suspected” original “in mind idea”: “criminal
behavior” as its suspected “modal name” for the pair English American.
criminal behavior, 15,100,000 Ref
“criminal behavior”, 2,630,000 Ref
That’s not bad beginning! We have at least one million documents somehow dealing with this
suspected in mind creature. The first step following our methodology would be a semantic
exploration (guided by experts) “around” this semantic core trying to build a first approach to a
“Criminal Behavior” thesaurus. Next we should start the real Darwin scouting Web task.
Second Drill: Another example, more controversial:
“terrorism”, 95,100,000 Ref
“global terrorism”, 475,000 Ref, not too much taking into account its real magnitude!
“terrorism map”, 5,400 Ref, not too much, only a few!
"terrorism thesaurus", 49! Ref, perfectly in the darkness!
We have arrived to see how we humans hide what we considerer morally and ethically
malevolent: via euphemisms!. So if we search by
euphemisms for terrorism, 95,900 Ref
terrorism slangs, 90,300,000 Ref
At first sight I, as a human, would tend to guide our agents to explore this euphemistic road!
Questioning 2: Use cases
1. As a Global Intelligent Interface to see the Web more and better via ANY CONVENTIONAL
SEARCH ENGINE or a pool of them via a Web Thesaurus;
2. To make trustable, non intrusive and massive surveys and polls about any existent “in mind”
subject;
3. To maintain a Human Total Memory and Knowledge Database, for instance in the LOC, Library
of Congress;
4. To build IR’s, trustable, non intrusive Intelligent Reports about any subject within the Web and
all World Data Reservoirs (within the publicly open as well as within the “dark” ones);
5. To build IP, Intelligent Portals, structured under Web 3.0 and Web 4.0 interactive models via
“concepts” instead of “words” like actually. Darwin portals may learn fast and easily to talk
precisely with any type of users, adding a new dimension to human communications;
6. To create automatically the best possible metadata of large not structured to semi structured
large databases, at large to make them semantic.
Questioning 1: Darwin vs. CSE’s, Conventional Search Engines
In order to unveil information and intelligence from the Web Darwin needs to accede to it thru an
index. Most CSE’s have the Web indexed by “words” no matter the language. Some of them like
Google have all Web pages indexed by, or as if they were indexed by, long chains of words. Some
of them are considered exhaustive and updated to the second. That is the raw material Darwin
needs to optimize the search. With this use Darwin could be considered a CSE Optimizer that
guides users semantically, to obtain what they need in terms of information and knowledge
whether possible in Only One Click.
Notwithstanding Darwin may be alternatively used to index the Web documents by concepts from
the beginning, at the moment they are uploaded. However this is another project that implies to
build something that replace CSE’s, at the moment a huge, costly and risky task. This reasoning
arise the following question: is it necessary the indexing by words? Yes it is, namely a first basic
and necessary index that should be followed by indexing by concepts.
The Darwin difference: What Darwin adds to any CSE like Google is a “semantic road map” to
make an efficient search. The “conventional user” of a CSE potentiated by this interface will query
the Web by a word or chain of words that he/she intuits will guide him/her to issue a reasonable
question. As hypothetically Darwin knows the Web semantically, circa 20 million of unique and
different ideas in all possible languages, it will warn users all possible regions/domains of
knowledge where those intuited words have existence, let’s say in 5 branches of knowledge under
30 main subjects.
Once the user makes the choice that considers more adequate Darwin sends a precise “semantic
road map” to the query box. This way CSE’s instead of rendering hundreds of thousands of
references will show only a small and very specific set of references closely related to the
“suspected” user “in mind idea”.
Note 003: following the evolutionary series [data => information => knowledge => wisdom] propelled via
intelligence we are entering into The Knowledge Era thru more than 8,000 years of a textual culture. We
were conscientious that some images carry more information, knowledge and even wisdom than thousands
of words but we never imagined that we could process them meaningfully and sometimes better than text.
Darwin ads this ability to all its computing steps and even more, it is trying to detect and unveil “gestures”.
=> Back
“El Leñador de Kentucky” (Abraham Lincoln)
Semantic Exploration for fun applying a Darwin Ontology interface via Google
By Juan Chamero, a casual “in mind image” transmitted along a friendly chat somewhere having a cup of coffee on 1st
of May 2014
First trial:
“El Leñador de Kentucky”, 25, all meaningful;
Perhaps pointing to a similar image: “El Honrado Abraham”, 45, all meaningful again;
Let’s try now to find word core equivalence in English, Lincoln native’s language:
lumber
timber
lumberjack
logger
treefeller
woodcutter
woodchopper
woodsman (No woodman!)
{the}{a} {} XYZ of Kentucky
Kentucky XYZ
Kentucky’s XYZ
XYZ: woodsman
Second trial:
“The Woodsman of Kentucky”, gives 0!
“A Woodsman of Kentucky”, gives 1!
“Woodsman of Kentucky”, 138,000 many of them meaningful instead! However most of them
pointing to “Daniel Boone” another legend! However browsing and wandering a little by the open
query (without quotation marks) woodsman Lincoln Kentucky we found the expression:
backwoodsman that probably fits better to the “in mind image” many of us have about Abraham
Lincoln.
Third trial:
“Backwoodsman of Kentucky”, 39 and a little confused;
“Kentucky backwoodsman”, 5,530, and now from here seen at the distance we get a reasonable
good sample of the “époque”, Lincoln and Boone.
Fourth trial:
As a byproduct of our semantic exploration we may take a glance over the superb semantic talent
of Lincoln. See this news titled “A Kentucky Backwoodsman who became our President” published
by St. Petersburg Times - Feb 12, 1969. From here you may go to see his famous Gettysburg
Address a 135 second speech considered one of the greatest speeches of the world. We reproduce
down here its most trustable version (it is a Bliss Copy (Gettysburg speech), presented by John G.
Nicolay his Personal Secretary):
Four score and seven years ago our fathers brought forth on this continent, a new nation,
conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are
engaged in a great civil war, testing whether that nation or any nation so conceived and so
dedicated, can long endure. We are met on a great battle-field of that war. We have come to
dedicate a portion of that field, as a final resting place for those who here gave their lives that that
nation might live. It is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this
ground. The brave men, living and dead, who struggled here, have consecrated it, far above our
poor power to add or detract. The world will little note, nor long remember what we say here, but it
can never forget what they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to
be here dedicated to the great task remaining before us -- that from these honored dead we take
increased devotion to that cause for which they gave the last full measure of devotion -- that we
here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall
have a new birth of freedom -- and that government of the people, by the people, for the people,
shall not perish from the earth.
Abraham Lincoln
November 19, 1863
List of suspected concepts
c1: [a new nation], [conceived in Liberty], and dedicated to the proposition that [all men are created
equal];
c2: civil war,
c3: testing whether any nation so conceived and so dedicated, can long endure;
c4: We are met on a great battle-field;
c5: [We have come to dedicate a portion of that field, as a final resting place for those who here
gave their lives that that nation might live]. [It is altogether fitting and proper that we should do this];
c6: [we can not dedicate] --[ we can not consecrate] -- [we can not hallow];
c7 The brave men, living and dead, who struggled here, have consecrated it, far above our poor
power to add or detract; Google renders 42.000 references for this expression as it is exactly!
c8: The world will little note, nor long remember what we say here, but it can never forget what they
did here;
c9: It is for us the living, rather, to be dedicated here to the unfinished work which they who fought
here have thus far so nobly advanced;
c10: it is a reinforced compromise;
c11: we here highly resolve that these dead shall not have died in vain;
c12: this nation, under God, shall have a new birth of freedom;
c13: government of the people, by the people, for the people, shall not perish from the earth;
=> Back
Human Knowledge Disciplines
First brief exploration as a test of availability
By Juan Chamero, from Buenos Aires, as of November 11th
2014
 http://en.wikipedia.org/wiki/List_of_academic_disciplines_and_sub-disciplines#cite_note-
1
 http://www.basicknowledge101.com/index.html
 http://www.dmoz.org/
 NATO A-Z Thesaurus, http://www.nato.int/cps/en/natolive/topics.htm
 Techniques for mapping thematically, http://thematicmapping.org/techniques/
 OGC-KML, http://www.opengeospatial.org/standards/kml/
 AoK, ToK,
https://ibpublishing.ibo.org/exist/rest/app/tsm.xql?doc=d_0_tok_gui_1304_1_e&part=2&
chapter=4: It states eight AoK, namely:
o mathematics
o natural sciences
o human sciences
o history
o the arts
o ethics
o religious knowledge systems
o indigenous knowledge systems.
 List of 100 Universal Themes,
http://www.mychandlerschools.org/cms/lib6/AZ01001175/Centricity/Domain/963/univer
sal%20themes.pdf
 Another list, http://www.docstoc.com/docs/25450957/List-of-Universal-Themes
 HOT LIST, Wikipedia, ~10,000 encyclopedic topics,
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Missing_encyclopedic_articles/Hot/T
2
 Wikipedia content index, http://en.wikipedia.org/wiki/Portal:Contents/Overviews
 Wikipedia actual content, A-Z,
http://en.wikipedia.org/wiki/Portal:Contents/A%E2%80%93Z_index
 some ideas starting from zero knowledge: a) from 35,000,000,000 URL’s we may choose
200,000,000 in English and at random somehow conforming 200 differentiable
hypothetically homogeneous thematically clusters sizing in the average 1,000,000 URL’s;
 Career List 1 (Glasgow University:
 Accounting & Finance
 Aeronautical & Manufacturing Engineering
 Agriculture & Forestry
 American Studies
 Anatomy & Physiology
 Anthropology
 Archaeology
 Architecture
 Art & Design
 Biological Sciences
 Building
 Business and Management Studies
 Celtic Studies
 Chemical Engineering
 Chemistry
 Civil Engineering
 Classics & Ancient History
 Communication & Media Studies
 Computer Science
 Dentistry
 Drama, Dance & Cinematics
 East & South Asian Studies
 Economics
 Education
 Electrical & Electronic Engineering
 English
 Film-making
 Food Science
 French
 General Engineering
 Geography & Environmental Science
 Geology
 German
 History
 History of Art, Architecture & Design
 Hospitality, Leisure, Recreation & Tourism
 Iberian Languages
 Italian
 Journalism
 Land & Property Management
 Law
 Librarianship & Information Management
 Linguistics
 Marketing
 Materials Technology
 Mathematics
 Mechanical Engineering
 Medicine
 Middle Eastern & African Studies
 Music
 Nursing
 Other Subjects Allied to Medicine
 Pharmacology & Pharmacy
 Philosophy
 Physics & Astronomy
 Politics
 Psychology
 Russian & East European Languages
 Social Policy
 Social Work
 Sociology
 Sports Science
 Theology & Religious Studies
 Town & Country Planning and Landscape Design
 Veterinary Medicine
List of MINOR JOBS/CAREERS
http://www.alec.co.uk/free-career-assessment/list-of-careers.htm
Administration Jobs:
 Accounting Officers
 Administrative Assistants
 Administrative Support Worker Supervisors and Managers
 Auditing Officers
 Bookkeepers
 Cashiers
 Computer Operators
 Couriers
 Credit Authorisers and Officers
 Customer Service Representatives
 Data Entry Personnel
 Data Processing Officers and Assistants
 Database Administrators
 Debt Collectors
 Dispatchers
 Filing Assistants
 Financial Officers
 Hotel Receptionists
 Human Resources Assistants
 Information Officers
 Interviewers
 Invoicing Officers
 Librarians
 Library Assistants
 Messengers
 Meter Readers
 Office Clerks
 Office Supervisors and Managers
 Order Clerks
 Payroll Clerks
 Postal Room Staff
 Postal Service Workers
 Procurement Officers
 Production and Distribution Officers
 Production and Planning Officers
 Receptionists
 Record Clerks
 Reservation and Transportation Ticket Agents
 Secretaries
 Transporting and Receiving Officers
 Stock Control and Order Fillers
 Travel Agents
Agricultural Jobs list of careers:
 Agricultural Managers
 Agricultural workers
 Animal Husbandry workers
 Conservation workers
 Farm managers
 Farmers
 Fishermen
 Forestry Workers
 Trawler Operators
Finance Jobs:
 Accountant
 Actuaries
 Auditors
 Budget Analysts
 Cashiers
 Debt Counsellors
 Economists
 Insurance Sales Agents
 Insurance Underwriters
 Loan Officers
 Personal Financial Advisors
 Tax Inspectors, Collectors and Revenue Agents
Construction Jobs:
 Block Tile Pavers
 Boilermakers
 Carpenters
 Carpet, Floor, and Tile Fitters
 Ceiling Tile Installers
 Concrete Finishers
 Construction and Building Inspectors
 Construction Equipment Operators
 Construction Managers
 Drywall Installers
 Electricians
 Glaziers
 Hazardous Materials Removal Workers
 Insulation Workers
 Lift Installers and Repairers
 Painters and Decorators
 Pipelayers and Plumbers
 Plasterers Masons
 Roofers
 Sheet Metal Workers
 Site Labourers
 Stonemasons
 Structural Iron and Metal Workers
Creative Jobs:
 Actors
 Announcers
 Artists
 Camera Operators and Editors
 Choreographers
 Craftspeople
 Dancers
 Designers
 Desktop Publishers
 Graphic Designers
 Interior Designers
 Musicians
 Photographers
 Producers
 Singers
 Website Developers and Designers
 Writers and Editors
Education and Teaching Jobs list of careers:
 Computer Trainers
 Education Administrators
 Home Tutors
 Pre-school Teachers
 Special Education Teachers
 Teachers - Community and Adult Education
 Teachers - Primary and Middle
 Teachers - Secondary and Upper Level
 Teaching Assistants
 Training Specialists and Managers
 University and College Lecturers
Healthcare and Health Related Jobs:
 Anaesthetists
 Chiropractors
 Counsellors
 Dental Hygienists
 Dental Laboratory Technicians
 Dentists
 Dieticians
 Health Services Managers
 Home Healthcare Assistants
 Language Pathologists
 Medical Assistants
 Medical records specialist careers
 Medical Scientists
 Medical Services Managers
 Mental Health Workers
 Midwives
 Nurses
 Nursing Assistants
 Nutritionists
 Occupational Health and Safety Specialists and Technicians
 Occupational Therapist Assistants
 Occupational Therapists
 Ophthalmic Laboratory Technicians
 Opticians, Dispensing
 Optometrists
 Paramedics
 Pharmacists
 Pharmacy Assistants
 Pharmacy Technicians
 Physical Therapist Assistants
 Physical Therapists
 Physician Assistants
 Physicians
 Psychiatric Assistants
 Psychologists and Psychiatrists
 Recreational Therapists
 Registered Nurses
 Respiratory Therapists
 Social Service Assistants
 Social Workers
 Surgeons
Medical Sciences Jobs:
 Audiologists
 Biomedical Engineers
 Cardiovascular Technologists and Technicians
 Diagnostic Medical Sonographers
 Emergency Medical Technicians
 Health Information Technicians
 Nuclear Medicine Technologists
 Radiological Technologists and Technicians
 Surgical Technologist
IT and Telecommunications:
 Computer Maintenance
 Computer Programmers and Operators
 Computer Scientists
 Computer Software Engineers
 Information Systems Managers
 Systems Analysts
 Systems Developers
 Telecommunications Equipment Installers and Repairers
 User Support Personnel
Management Jobs list of careers:
 Administrative Services Managers
 Buyers
 Claims Adjusters, Appraisers, Examiners, and Investigators
 Community Association Managers
 Computer Managers
 Cost Estimators
 Engineering Managers
 Financial Analysts
 Financial Managers
 Food Service Managers
 Funeral Directors
 Health Services Managers
 Human Resources Managers and Specialists
 Industrial Production Managers
 Information Systems Managers
 Labour Relations Specialists and Managers
 Management Analysts
 Marketing Managers
 Medical Services Managers
 Natural Science Managers
 Promotions Managers
 Property Managers
 Public Relations Managers
 Purchasing Agents
 Purchasing Managers
 Retail Managers
 Sales Managers
 Senior Executives
Manufacturing Jobs:
 Aircraft and Avionics Equipment Mechanics and Service Technicians
 Assemblers and Line Workers
 Automobile Service Technicians and Mechanics
 Boiler Operators
 Bookbinders
 Clothing Manufacturers
 Diesel Service Technicians and Mechanics
 Engine Mechanics
 Fabricators
 Food Processing Workers
 Furnishing Careers
 Heating, Air Conditioning, and Refrigeration Mechanics and Installers
 Heavy Vehicle Service Technicians and Mechanics
 Industrial Machinery Installation, Repair, and Maintenance Workers
 Inspectors, Testers, Sorters, Samplers
 Jewellers and Precious Stone and Metal Workers
 Line Installers and Repairers
 Machine Operators
 Machine Setters and Operators
 Machinists
 Mobile Equipment Service Technicians and Mechanics
 Painting and Coating Workers
 Photographic Process Workers and Processing Machine Operators
 Power Plant Operators, Distributors, and Dispatchers
 Precision Instrument and Equipment Production
 Pre-press Technicians and Workers
 Printing Machine Operators
 Radio Equipment Manufacture and Installation
 Semiconductor Processors
 Stationary Engineers
 Textile Careers
 Tool and Die Makers
 Water and Liquid Waste Treatment Plant and System Operators
 Welding, Soldering, and Brazing Workers
 Woodworkers
Professional:
 Archivists
 Clergy
 Coaches
 Correctional Treatment Specialists
 Correspondents
 Court Reporters
 Curators
 Directors
 Instructional Co-ordinators
 Interpreters
 Judges, Magistrates, and Other Judicial Workers
 Lawyers
 Legal Assistants
 Library Technicians
 Market Researchers
 News Analysts
 Operations Research Analysts
 Probation Officers
 Reporters
 Social Scientists
 Statisticians
 Translators
 Veterinary Surgeons
 Veterinary Technicians
Repair and Maintenance Jobs list of careers:
 Automobile Body and Related Repairers
 Electrical and Electronics Installers and Repairers
 Electronic Home Entertainment Installers and Repairers
 General Maintenance and Repair Workers
 Home Appliance Repairers
 Office Machine Repair
Sales, Marketing and Related Jobs:
 Advertising Managers
 Estate Agents
 Marketing Managers
 Product Promoters
 Promotions Managers
 Public Relations Specialists
 Retail Salespersons
 Sales Engineers
 Sales Representatives
 Sales Team Managers
 Travel Agents
Service Related Jobs:
 Barbers
 Beauty Therapists
 Building Cleaning Workers
 Catering Workers
 Chefs, Cooks, and Kitchen Workers
 Childcare Workers
 Correctional Officers
 Dental Assistants
 Firemen
 Fitness Workers
 Flight Attendants
 Grounds Maintenance Workers
 Investigators
 Personal Aides
 Personal Appearance Workers
 Pest Control Workers
 Police and Detectives
 Private Detectives
 Recreation Workers
 Security Guards
Technical:
 Aerospace Engineers
 Agricultural Engineers
 Agricultural Scientists
 Architects
 Astronomers
 Atmospheric Scientists
 Biological Scientists
 Broadcast Engineering Technicians
 Cartographers
 Chemical Engineers
 Chemists and Materials Scientists
 Civil Engineers
 Clinical Laboratory Technologists and Technicians
 Computer Hardware Engineers
 Conservation Scientists
 Drafters
 Electrical and Electronics Engineers
 Engineering Technicians
 Engineers
 Environmental Engineers
 Environmental Scientists
 Food Scientists
 Geological Engineers
 Geo-scientists
 Health and Safety Engineers
 Industrial Engineers
 Landscape Architects
 Materials Engineers
 Mathematicians
 Mechanical Engineers
 Mining Engineers
 Mining Safety Engineers
 Museum Technicians
 Nuclear Engineers
 Petroleum Engineers
 Physicists
 Radio Operators
 Science Technicians
 Sound Engineering Technicians
 Surveyors and Surveying Technicians
 Systems Analysts
 Town Planners
Transport list of careers:
 Air Traffic Controllers
 Aircraft Pilots
 Bus Drivers
 Flight Attendants
 Flight Engineers
 Removals Occupations
 Rail Transport Occupations
 Taxi Drivers and Chauffeurs
 Truck Drivers and Delivery Workers
 Water Transport Occupations
Stanford University Syllabus
=> Back
Word versus Concept
Example of a Darwin Team Semantic Workshop Discussion
Held at Barcelona, Buenos Aires, Dallas, as of February 2015
By Juan Chamero, from Buenos Aires as of 27
th
April 2015
Subject of discussion: What’s in a word?
As per Google: 470,000,000 references from Buenos Aires (IP) as of Feb 2015
Word
Source 1: http://en.wikipedia.org/wiki/Word
In linguistics, a word is the smallest element that may be uttered in isolation
with semantic or pragmatic content (with literal or practical meaning). This contrasts with
a morpheme, which is the smallest unit of meaning but will not necessarily stand on its
own. A word may consist of a single morpheme (for example: oh!, rock, red, quick, run,
expect), or several (rocks, redness, quickly, running, unexpected), whereas a morpheme
may not be able to stand on its own as a word (in the words just mentioned, these are -s, -
ness, -ly, -ing, un-, -ed). A complex word will typically include a root and one or
more affixes (rock-s, red-ness, quick-ly, run-ning, un-expect-ed), or more than one root in
a compound (black-board, rat-race). Words can be put together to build larger elements
of language, such as phrases (a red rock), clauses (I threw a rock), and sentences (He threw
a rock too but he missed).
The term word may refer to a spoken word or to a written word, or sometimes to the
abstract concept behind either. Spoken words are made up of units of sound
called phonemes, and written words of symbols called graphemes, such as the letters of the
English alphabet.
Semantic definition [edit]
Leonard Bloomfield introduced the concept of "Minimal Free Forms" in 1926. Words are thought
of as the smallest meaningful unit of speech that can stand by themselves.[1]
This correlates
phonemes (units of sound) to lexemes (units of meaning). However, some written words are not
minimal free forms as they make no sense by themselves (for example, the and of).[2]
Some semanticists have put forward a theory of so-called semantic primitives or semantic primes,
indefinable words representing fundamental concepts that are intuitively meaningful. According
to this theory, semantic primes serve as the basis for describing the meaning, without circularity,
of other words and their associated conceptual denotations.[3]
Source 2: http://www.thefreedictionary.com/word
On this page
Thesaurus
Translations
Word Browser
Advertiseme
nt (Bad
banner?
Please let us
know Remov
e Ads
Share: Cite / link:
word (wûrd)
n.
1. A sound or a combination of sounds, or its representation in writing or printing, that symbo
lizes andcommunicates a meaning and may consist of a single morpheme or of a combination
of morphemes.
2. Something said; an utterance, remark, or comment: May I say a word about that?
3. Computer Science A set of bits constituting the smallest unit of addressable memory.
4. words Discourse or talk; speech: Actions speak louder than words.
5. words Music The text of a vocal composition; lyrics.
6. An assurance or promise; sworn intention: She has kept her word.
7.
a. A command or direction; an order: gave the word to retreat.
b. A verbal signal; a password or watchword.
8.
a. News: Any word on your promotion? See Synonyms at news.
b. Rumor: Word has it they're divorcing.
9. words Hostile or angry remarks made back and forth.
10. Used euphemistically in combination with the initial letter of a term that is considered off
ensive ortaboo or that one does not want to utter: "Although economists here will not call it a
recession yet,the dreaded 'R' word is beginning to pop up in the media" (Francine S. Kiefer).
11. Word
a. See Logos.
b. The Scriptures; the Bible.
Concept
Source 1: http://en.wikipedia.org/wiki/Concept
A concept is an abstraction or generalization from experience or the result of a transformation of
existing concepts. The concept reifies all of its actual or potential instances whether these are
things in the real world or other ideas. Concepts are treated in many if not most disciplines
whether explicitly such as in psychology, philosophy, etc. or implicitly such as
in mathematics, physics, etc.
When the mind makes a generalization such as the concept of tree, it extracts similarities from
numerous examples; the simplification enables higher-level thinking.
In metaphysics, and especially ontology, a concept is a fundamental category of existence. In
contemporary philosophy, there are at least three prevailing ways to understand what a concept
is:[1][See talk page]
 Concepts as mental representations, where concepts are entities that exist in the brain.
 Concepts as abilities, where concepts are abilities peculiar to cognitive agents.
 Concepts as abstract objects, where objects are the constituents of propositions that mediate
between thought, language, and referents.
Note 01: Darwin differentiates the three ways.
Mental representations
Main article: Mental representation
In a physical theory of mind, a concept is a mental representation, which the brain uses to denote
a class of things in the world. This is to say that it is literally, a symbol or group of symbols together
made from the physical material of the brain.[7][8]
Concepts are mental representations that allow
us to draw appropriate inferences about the type of entities we encounter in our everyday
lives.[8]
Concepts do not encompass all mental representations, but are merely a subset of
them.[7]
The use of concepts is necessary to cognitive processes such as categorization, memory,
decision, learning, and inference.[citation needed]
Source 2: http://en.wikipedia.org/wiki/Problem_of_universals
Note 02: Please read it carefully for our next discussion because this subject is closely related to a recent
Darwin essay about the quantum nature of the present duality (nothing and everything) and the Web as a
universe of “avatars”.
Source 3: juan chamero, Darwin Architect
Possible names structures that may point to concepts as per Darwin specific “in mind” ideas:
o [w], only a small percentage of those *w’s+ are concepts
o [w w], only an even smaller percentage of those *w w’s+ are concepts
o [w w w], only an even smaller percentage of those *w w w’s+ are concepts
o [w w w w], only an even smaller percentage of those *w w w w’s+ are concepts
o …………….
*w w’s+ examples: *it is+, *how cold+, *never mind+, ……
[w w] examples: [parallel processing], [visual arts], [performing arts+, …
Concepts [ ] are hierarchically organized, tending to structure as “Logical Trees”
most *w w w …w’s+ are nonsensical chains, for example:
[how how fine are you], [as_it_is now], [that is to say namely], [up up up up for ever], [me too]
and many may point to agreed coded messages or portions of coded messages as well…
Word Universe
The number of words in the English language is: 1,025,109.8. This is the estimate by the Global
Language Monitor on January 1, 2014. The English Language passed the million Word thresholds
on June 10, 2009 at 10:22 a.m. (GMT). The Millionth Word was the controversial 'Web 2.0′.
So you may imagine the amount of false and pseudo concepts *w w w …+
On the contrary, “in mind” idea is a piece of knowledge, something unique, specific, that deserve
to be documented, to be explained meaningfully via common languages;
Source 4: http://www.hutong-school.com/how-many-chinese-characters-are-there
The Chinese Characters and their Numbers
China has always been “larger than life” when it comes to numbers and quantities. After all, it
has the largest population out of any other country. It is no wonder then, that its language has
become just as extensive! But exactly how big is it? The Chart of Generally Utilized Characters
of Modern Chinese defines the existence of 7000 characters! If you think this number is high,
you'll be shocked to hear that according to the Great Compendium of Chinese Characters or
“Hanyu Da Zidian” (汉语大字典; H{nyǔ d{ zìdiǎn), the number of existing characters is
actually 54,678! But if you’re the kind of person that loves a challenge then there's the
Dictionary of Chinese Variant Form (中华字海; Zhōnghu| zì hǎi). This work, also called the
“Yìtǐzì zìdiǎn” (异体字字典), contains definitions for 106,230 Chinese characters!
But luckily, there’s no need to be scared. Another document called the Chart of Common
Characters of Modern Chinese only includes 3500 characters -- that's half the amount
included in the first chart. To make things easier, you probably won’t even need 1,000 of
them, since they are considered less commonly used characters. If you learn Chinese and take
the official Chinese language test called the H{nyǔ Shuǐpíng Kǎoshì 汉语水平考试 (also known
as the HSK), you will only need to show knowledge of 2,600 characters to pass the exam at the
highest level. And if this is not enough, check out these interesting facts: with 2,500 characters
you can read 97.97 % of everyday written language and with 3,500 characters you can read
up to 99.48 %, which means pretty much everything. It’s even more comforting to know that
with only 900 characters, you can actually read 90% of a newspaper!
Note 03: Similar considerations may be applied to any “pair “Language – Culture”. In fact a glossary of the
most common 1,000 terms is good enough. => Back
Towards a Mathematics Semantic Seed Buildup
Dr Eduardo Ortiz and Co-Workers
First draft
Juan Chamero review on June 22
nd
2009, from Caece University, Buenos Aires, Argentina
Last review on October 20th 2011
Note: Dr. Eduardo Ortiz is an Imperial College of London (Emeritus) Professor of Mathematics and History of Mathematics.
The Eduardo Ortiz and Co-workers’ “semantic seed” on mathematics (see below) is an opening of
52 subjects. All of them have been semantically checked, both by human and by Darwin agents
with the following result:
 All names are “modals” respect to Google, namely they are the statistically best suited
“keywords” to point to their represented “concepts”. It means that “Dr Ortiz and Co-workers”
selected the right terms in the right sequence, genre, spelling, writing, number mode
singular or plural. Agents select the best suited among hundreds of potentially similar
keywords. Ortiz and Co-workers’ names matched 100% modals retrieved via agents. By the
way this discipline probed to be extremely sensitive to small changes, namely: “operator
theory”: 493.000 versus “operators theory”: 11.700 ;
 Top references (100 per each query) called by those modal “keywords” proved to be
strongly authoritative. Our idea is to provide the Authorities URL´s set retrieved from the
whole Top references raw data as the initial “Authoritative seed” to guide Darwin agents
scouting. From our experience authorities stack together on top as an almost relatively
durable and solid block;
 The seed seems to be complete, good enough to generate the whole Mathematics Logical
Tree without “semantic holes”, disciplines, sub-disciplines or subjects missing or poorly
covered. What we say a reasonable good “semantic umbrella”. It touches (the seed) some
other related disciplines in a proportion that is continually computed by agents:
However some additional work has to be performed in order to use this seed that looks semantically
“flat”. Let´s play a sort of imagery about it. If “mathematics” is the “root” of the MLT, Mathematical
Logical Tree, we may imagine up to 8 levels of opening (some disciplines ampler but more
ambiguous than mathematics, like for example ART, have up to 13 levels) going down from root to
“leaves”. On the contrary some semantically well known disciplines like “Computing” are
represented by LT, Logical Trees of no more than six levels. With an average opening of 5 for each
“node” of a fancy LT of six levels we would have [1 – 5 – 25 – 125 – 625 – 3125] as the nodes by
level sequence totaling a 3906 nodes-subjects LT. Please disregard this number and take it only as
gaming a little with figures to improve our graphic image of semantic trees. From our experience
and “first impression” we program the limits and boundaries of our Darwin agents’ exploration
program for Mathematics: expecting to unveil from 2.000 to 6.000 subjects distributed from root to
leaves thru 5 to 8 levels.
Taking a look to the Ortiz seed we appreciate the same suspected anomaly he and his Co-workers
pointed out (written in colloquial Spanish).
“-------Fíjate que hay dos grupos grandes (100-200) y (1000-5000), tres muy grandes (Statistics,
6.400); (Numerical Analysis 4.400) y dos enormes: Computer Science (13000) y Education (12.000)
que no hacen juego con los 17000 que da para toda la matematica. Las partes mas abstractas e
importantes de la Mat. no necesariamente tienen muchas citas…..”
As a review and consistency check we performed the same search using the same original seed
terms. Complementary to the result depicted above we confirm the suspected behavior and just
reading on Top references as a human acquainted with engineering, physics and mathematics
terminologies we detect the following list of terms that probably deserve being in the upper level of
the seed, above the initially unique level (some of them present)
Mathematics: 144.000.000
Counting: 61.000.000
Geometry: 43.100.000
Algebra: 31.400.000
Arithmetic: 18.900.000
Calculus: 17.100.000
Topology: 12.800.000
Mathematical models (within applied mathematics): 10.100.000
Applied mathematics: 9.750.000
Combinatorial: 4.930.000
Combinatorics: 2.090.000
Discrete mathematics: 2.080.000
Mathematical Logic: 1.510.000
Physics mathematics: 1.220.000
Theorem proving: 1.190.000
Cryptanalysis: 1.150.000
Pure mathematics: 954.000
Mathematical Biology: 928.000
Mathematics of Computing: 494.000
Mathematical Language: 367.000
Financial Mathematics (within applied mathematics): 311.000
Mathematical Philosophy: 158.000
Mathematical algorithms: 144.000
Physical mathematics: 18.000
Virtual nodes
When mapping complex and ambiguous contents as those of ART we realized of the need of
“virtual nodes” that behave as virtual mothers of a bunch of subjects that share significant meaning,
enough to be considered as “derived” or “sons”. And we say virtual because many times they do not
have semantic existence yet. However most times these virtual nodes are existent by somehow
ignored by the specialists because their implicit and trivial maternity. One example of this type is
“geometry” mother of – and not at the same level that- “Convex and discrete geometry”, and
“Differential geometry”, and perhaps we may group all apparently “derived” topologies from a virtual
topology mother node. Similar grouping could be perhaps obtained by joining openings 37, 38, 39,
40, 41 and 42 in a “physical mathematics” type mother node And the still important opening (48)
“Operations Research” could be opened in “Mathematical programming”, “Games theory”,
“Economics” and “Social and behavioral sciences”.
Math seed as per Eduardo Ortiz and Co-workers
1. “General Mathematics”: 5.310.000
History and biography => “History of Mathematics”: 713.000; “Biographies of Mathematicians”:
17.200;
2. “K-theory”: 260.000 => “K-theory” + mathematics: 177.000;
3. “Group theory and generalizations”: 21.700 => “Group Theory” 2.160.000; is it perhaps hidden
significant derivations within GT concept?;
4. “Topological groups”, “Lie groups” => “Topological Groups”: 201.000; “Lie Groups” within
Topological Groups: 65.200, high enough!;=> perhaps it justifies a semantic node derivation;
5. “Real functions” => “Real functions”: 263.000; Please check it, it looks like a suffix;
6. “Measure and integration”: 128.000 nice example of a “two basic words” keyword, namely
defined with two “heavy” common words => measure: 183.000.000 AND integration: 148.000.000;
7. “Functions of a complex variable”: 195.000 perhaps a better (4 words) keyword example!;
8. “Potential theory”: 447.000 => shared with Physics; “Potential theory” + mathematics: 267.000;
9. “Several complex variables and analytic spaces”: 18.300 an interesting example that shows the
existence of meaningful long chains of words keywords (6): 18.300 references, most of them
meaningful!
10. “Special functions”: 1.330.000! => shared with Physics and other disciplines; “Special functions”
+ mathematics: 432.000;
11. “Ordinary differential equations”: 1.430.000!
12. “Partial differential equations”: 2.620.000!
13. “Dynamical systems and ergodic theory”: 24.000 => even though shared with Physics, it
fundamentally belongs to the mathematics realm;
14. “Difference and functional equations”: 9.130 => rather small please check!. Is it perhaps a
derivation?;
15. “Sequences, series, summability”: 3.360 => the same observation as above (14,..) please check
it!;
16. “Approximations and expansions”: 15.600 => it is funny but Google and some other Search
Engines present anomalies: please check “Approximations and expansions” + mathematics renders
16.900, a little higher (at June 22nd
2009);
17. “Fourier analysis”: 983.000 => shared with Physics “Fourier analysis” + mathematics: 438.000;
18. “Abstract harmonic analysis”: 46.600
19. “Integral transforms”: 273.000
20. “Operational calculus”: 163.000 => even though essentially a mathematics subject shared with
physics and other disciplines like “applications”: “Operational calculus” + mathematics: 131.000
meanwhile “Operational calculus” + physics: 53.000;
21. “Integral equations”: 1.200.000
22. “Functional analysis”: 4.320.000! => When in a “semantic seed” appears a parental node with a
significant number of references (as for example this and 10, 11, 12, and 21 openings) it would be
convenient human go a little deep inside suggesting obliged derivations in order to guide Darwin
agents better;
23. “Operator theory”: 673.000 => by the way isn´t this a derivation from above?(22); Please check
it;
24. “Calculus of variations and optimal control optimization”: 6.900 => please check a potentially
better keyword feasibility omitting optimization that renders 48.100
25. “Geometry”: 42.900.000 => isn´t too big?. Please check our upper level virtual nodes thesis to
see if it fits up there;
26. “Convex and discrete geometry”: 19.300
27. “Differential geometry”: 1.290.000
28. “General topology”: 234.000 => with slight contacts with many other disciplines; however the
semantic embedding mathematics => general topology will clear any possible ambiguity;
29. “Algebraic topology”: 817.000
30. “Manifolds and cell complexes”: 105.000
31. “Global analysis”: 1.720.000 => this keyword needs to be embedded within mathematics
rendering 248.000 in order to eliminate ambiguity;
32. “Analysis on manifolds”: 274.000 => please check if this keyword is a common suffix of others;
33. “Probability theory and stochastic processes”: 97.300 => please check if either “probability”, or
“probability theory” could become an upper virtual node as in (25); => “probability”: 66.700.000,
“probability theory”: 1.740.000, “stochastic processes”: 2.140.000;
34. “Statistics”: 637.000.000 => if properly defined under mathematics it should be semantically
embedded and in the upper level (see our comment about “seed upper level” and “virtual nodes”:
35. “Numerical analysis”: 3.830.000
36. “Computer science”: 77.000.000 => it seems too big to derivate directly from math. Please
check following terms: Computer science mathematics: 1.570.000 but most authoritative references
deal with deceptive pseudo chains such as “computer science”, “mathematics”,….physics,….;
“Mathematics of Computer science”: renders 18.700 that seems to properly focusing in
mathematics;
37. “Mechanics of particles and systems”: 16.800 => somehow shared with Physics?: “Mechanics of
particles and systems” + mathematics: 11.800;
38. “Mechanics of solids”: 226.000 => somehow shared with Engineering/Physics?; “Mechanics of
solids” + mathematics: 110.000!;
39. “Mechanics of deformable solids”: 30.000 => somehow shared with Engineering/Physics?:
“Mechanics of deformable solids” + mathematics: 12.400;
40. “Fluid mechanics”: 4.930.000 => somehow shared with Engineering/Physics?: “Fluid
mechanics” + mathematics: 1.640.000;
41. “Optics, electromagnetic theory”: 23.200 => somehow shared with Physics?: “Optics,
electromagnetic theory” + mathematics: 9.140;
42. “Classical thermodynamics, heat transfer”: 1.160 (?) => probably too poor; please check with
equivalent keywords; shared with Engineering/Physics; for example “thermodynamics, heat
transfer” (comma could be omitted): 121.000; “thermodynamics, heat transfer” + mathematics:
28.700;
43. “Quantum theory”: 4.470.000 => shared with Physics; “Quantum theory” + mathematics:
1.150.000;
44. “Statistical mechanics, structure of matter”: 3.830 => shared with Physics; “Statistical
mechanics, structure of matter” + mathematics: 3.820; even though backed up by too low figures it
looks like a well focused keyword. Please observe that references with/without mathematics render
pretty much the same as essentially belonging to the mathematics realm!;
45. “Relativity and gravitational theory”: 3.340 => shared with Physics;”Relativity and gravitational
theory” + mathematics: 862;
46. “Astronomy and astrophysics”: 2.510.000 => shared with Astronomy and Astrophysics (within
Physics); "Astronomy and astrophysics" + mathematics: 1.960.000; it looks like too akin to the
mathematics realm;
47. “Geophysics”: 9.680.000 => A typical subject belonging to Physics realm; However its “math
side” looks strong: “Geophysics” + mathematics renders 3.150.000!; perhaps it is necessary to coin
a new keyword as “geophysics mathematics” that even now renders 15.600;
48. Operations research: 5.480.000 => far from its “golden age” this term looks strong enough and
encompassing (it should be checked) some other derived (as per “Ortiz seed”): 48.1: Mathematical
programming: 856.000; 48.2: Game theory: 3.880.000; 48.3: “Economy” + mathematics: 125.000;
48.4: “social and behavioral sciences” + mathematics: 202.000. As a suggestion we add “social and
behavioral sciences” as a new node below;
49. Social and behavioral sciences: 541.000 => shared with Sociology, Cybernetics and Systems
theory; “Social and behavioral sciences” + mathematics: 202.000;
50. Biology and other natural sciences: 18.300 => it looks like a new keyword perhaps created to
differentiate a new branch of science like “molecular biology” from biology, physics and algorithmic
in the recent past; perhaps this is a transition subject; it looks like a term coined by mathematicians
because: “Biology and other natural sciences” + mathematics renders 12.500!;
51. “System theory”, control: 1.140.000 => for “systems theory” + control as per “Ortiz seed”;
“system theory” alone: 1.960.000 and “system theory + control + mathematics: 344.000;
52. “Information and communication”, circuits: 552.000 => this figure stands for “Information and
Communication” + circuits; shared with Systems/Communications; “Information and
Communication” + circuits + mathematics: 97.300;
53. “Mathematics education”: 1.770.000
Original Ortiz seed
1. Mathematics, 16.900 K
2. General, 15.400
3. History and biography, 221
4. K-theory, 291
5. Group theory and generalizations, 213
6. Topological groups, Lie groups, 201
7. Real functions, 1.500
8. Measure and integration, 270
9. Functions of a complex variable, 2.110
10. Potential theory 421
11. Several complex variables and analytic spaces, 208
12. Special functions, 1.810
13. Ordinary differential equations, 1070
14. Partial differential equations, 149
15. Dynamical systems and ergodic theory, 0
16. Difference and functional equations, 2.240
17. Sequences, series, summability, 174
18. Approximations and expansions, 221
19. Fourier analysis, 1.940
20. Abstract harmonic analysis, 188
21. Integral transforms, operational calculus, 195
22. Integral equations, 2.200
23. Functional analysis. 1.520
24. Operator theory, 541
25. Calculus of variations and optimal control; optimization, 240
26. Geometry, 1.550
27. Convex and discrete geometry, 657
28. Differential geometry, 2.250
29. General topology, 700
30. Algebraic topology, 2.320
31. Manifolds and cell complexes, 85
32. Global analysis, analysis on manifolds, 94
33. Probability theory and stochastic processes, ¿?
34. Statistics, 6.370
35. Numerical analysis, 4.340
36. Computer science, 13.000
37. Mechanics of particles and systems, 491
38. Mechanics of solids, 2.110
39. Mechanics of deformable solids, 29
40. Fluid mechanics, 1.920
41. Optics, electromagnetic theory, 128
42. Classical thermodynamics, heat transfer, 268
43. Quantum theory, 2.210
44. Statistical mechanics, structure of matter, 1.1570
45. Relativity and gravitational theory, 1.710
46. Astronomy and astrophysics, 216
47. Geophysics, 1.760
48. Operations research, mathematical programming, 1.700
49. Game theory, economics, social and behavioral sciences, 153
50. Biology and other natural sciences, 1.950
51. Systems theory; control, 4.700
52. Information and communication, circuits, 1.1860
53. Mathematics education 11.800
100-200: 17
200-500: 4
500-1000: 5
1000-5000: 13
Around 5000: Statistics and Numerical analysis
More than 10.000: Computers science and Mathematics education
Total 17.000
“………..Fíjate que hay dos grupos grandes (100-200) y (1000-5000),
tres muy grandes (Statistics, 6.400); (Numerical Analysis 4.400) y dos enormes: Computer
Science (13000) y Education (12.000) que no hacen juego con los 17000 que da para toda
la matemática.
Las partes más abstractas e importantes de la Mat. no necesariamente tienen muchas
citas….”
=> Back
•This is a brief presentation of DARWIN, Distributed Agents to
Retrieve the Web Intelligence, a cutting edge Technology to
see the Web as "semantic", perfectly ordered. In the Web of
today live more than 30,000 millions of Web pages,
"creatures" of knowledge, a crucial asset of the humanity
growing exponentially in size and complexity.
The Web of Today
Unstructured
Only indexed by words
•From here the need to "map" it precisely and efficiently like a
Card Catalogue in a library. Based on proprietary agents and
Artificial Intelligence algorithms Darwin creates "semantic
glasses" to see the web like a Universal Encyclopedia perfectly
stored and indexed. As such any piece of knowledge -
information and intelligence - becomes retrievable, without
obfuscation, generally in only one search, displaying
meaningful results - not spam, fake, unreadable jargon, nor
relevant.
Semantic Glasses
To see more and better
In only one click
•Once the Web is mapped website owners and users,
governors and governed, teachers and learners, sellers and
buyers may infer their respective behaviors and build
Intelligent Reports about any subject. Darwin makes effective
the Web paradigm: One thing leads to another thing;
everything is connected.
Human Knowledge Maps
People’s behavior
Intelligence Reports
Prologue
Darwin a cutting edge technology
 Slides 3 to 8: A tool “to see more and better the Web”;
 Slides 9 to 15: A little of History;
 Slides 16 to 19: How we humans document - WWD’s;
 Slides 20 to 25: Entering into some detail - Level 1;
 Slides 26 to 31: Knowledge as Trees from ancient times;
 Slides 32 to 40: Entering in some detail - Level 2;
 Slides 41 to 70: Darwin Vignettes:
 Slide 41: Darwin Vignettes index;
 Slides 42 to 47: Entering in some detail - Level 3;
 Slides 48 to 57Darwin Antecedents;
 Slides 58 to 69 Darwin Conjectures 0 to 9;
 Slide 70 Epilogue, Darwin a cutting edge technology;
 Slides 71 to 75 Epilogue details I to V;
By Juan Chamero, Darwin Architect, August 12, 2013
Warning: If you are an Internet Expert you may go directly to
the Epilogue to see the scope and reach of this technology.
Darwin
Human Knowledge Map
Preliminary research performed by Intag, Darwin
Methodology creator, determines that it is
perfectly possible to cover Internet users’ needs
in terms of Knowledge in only one click. We are
talking of significant documents, Authorities and
hubs, subordinated to subject specificity, with
minimum redundancy, and fully covering all
Branches of Knowledge.
Web Semantic, Pages 293-294, a Juan Chamero e-book Series:
Mind To Digital
Darwin
Human Knowledge Map
This HKM will have a Basic Virtual Library of nearly 12,000,000
i-URL’s pointing to an equal number of their corresponding
Website documents (Authorities), referenced with a Web
Thesaurus of nearly 15,000,000 concepts per language. This
map is like a virtual Encyclopedia of nearly 40 million pages,
cross referenced by a Conceptual Universe of +15 million
concepts, more than 20 million of hyperlinks and semantically
structured via a Logical Tree of nearly 500,000 subjects -
nodes.
Web Semantic, Pages 293-294, a Juan Chamero e-book Series: Mind
To Digital
What’s in a Web Map
We may imagine the Web as a
huge Semantic Ocean where
two types of knowledge
creatures are continuously
interacting: the black ones (K
Realm) classified in up to 200
knowledge “species” versus the
green ones (K’ Realm), “we”,
the people as users.
The only creatures that “live” in
the Ocean are the Web Pages
meanwhile we, humans, are
“out”, however interacting, as
long as we are Internet
connected and active.
Darwin for Dummies
We, humans, have a rather big capacity to hold
“in mind” ideas in our brain however hidden.
This diversity is estimated in 10 to 15 millions per
language/culture.
Our ideas are recognized by others as long as they
are “meaningful explained”. However it is highly
probable that our explanations, for the same
idea, differ substantially. Along time, statistically,
we agree to assign unique “modal names” to our
ideas.
In the Web it’s supposed we are talking about the
same idea when we point to the same “name”. In
our culture, at least from Plato, those agreed
names are recognized as “concepts”.
In the figure the exultant Archimedes saying
“Eureka”, discovering its Law and naming it.
Something “To See more and better”
Galileo Galilei (1564 – 1642) pioneer
of the scientific revolution and for
many the father of modern science.
At its time Galileo was formally
accused of heresy because its
“sayings” about heliocentrism.
He invented the telescope a tool to
“see more and better” everything that
surround us including the polemic and
inscrutable sky (heavens) of those
times. In fact its telescope magnified
from 8 to 9 times our sight power.
“What then to see more and better”
• Web Creatures
• Web Pages
• Websites
• At last documents
A little of History
You may skip his section. However we think that “a little
of history” is always healthy as long as it contributes to
open our mind – and our spirit- to accept some changes
in our vision. Darwin deals with an Ontology to help us
“to see more and better” the Web and for that we have
to understand how we humans document our ideas.
We do that at any moment of our life, almost
automatically like “downloading” our “in mind” images
thru the art of writing. Darwin states that the core of our
downloads are “concepts”, in fact cognitive objects akin
to Chinese and Japanese ideograms. Darwin also talks
about akin entities like “symbols”, “words”,
“expressions”, “keywords”, “quotations”, “acronyms”,
“memes”, “cepts”, and “concepts”, however all them
with substantive differences.
This section intents to build a sort of emergency bridge
to fully understand our Darwin Ontology.
A little of History - I
As far as 15,000 years ago cavemen left
messages alive under the form of
petroglyphs and petrographs depicting
what was important for them. They
were things to communicate and
perhaps early “writing experiences”.
In Australia have been petroglyphs
discovered dated as old as 40,000 years
ago.
A little of History - II
Tartarian Tablets, discovered by
archaeologist Nicolae Vlassa, dated 5.900
BC years ago and for many the earliest
form of “writing” of the world.
It preceded the Proto Sumerian
Pictography .
Up to now nobody deciphered the
meaning of carved symbols.
A little of History - III
A long journey from hyeroglyphics
to handwriting. From a picture or
object representing a word.
650 BC appeared Demotic writing,
which became a popular form,
developed from the Hieratic and
retained the features of the
hieroglyphic system, including word
and phonetic signs.
Primitive writing was ideographic as
opposed to phonetic. Some cultures
evolved rapidly towards phonetic
writing in which conventional signs
or letters represent a sound.
However ideographic writing still
exists in the Chinese and Japanese
ideograms of today.
A little of History - IV
It’s a look on ancient History,
Language and Architecture by Dr.
Haluk Berkmen a PhD In Physics as a
contribution to unveil relations
between science and esotery,
ancient wisdom and the forgotten
past.
Chinese and Sumerian scripts
developed independently as the first
writing systems . The first synthetic
writing system was the early cave
paintings. Along time appeared
analytic writing systems where a
word or part of a word (a syllable) is
represented by a well defined sign.
A little of History - V
Early in our childhood we learn to write cursive,
namely handwriting, in the “hard way”. Cursive
derives from Latin cursivus that means running.
Most actual languages have their cursive mode
even ideographic ones like Chinese and Japanese
where the cursive-ness may apply to
connectedness of strokes within the same
character or ideogram.
A little of History – VI
Grammar and those heavy things
Writing Rules as handwriting are learn at the hard
way. This book of Charles Gulotta try to alleviate
that burden with humorous and friendly cartoons.
For junior – high and high school students.
Our learning should tend to write properly and
meaningful., ideally as applying a “formula” to
express our “in mind” ideas.
Have you ever played the WFF’N Proof game to
enhance children abilities in logic and problem
solving?. Take into account that WFF which
stands for Well Formed Formula could be a way to
learn how to write WWD’s, Well Written
Documents. You may also try to learn a little of
grammar.
Towards the idea of WWD’s – I
Good Writing – Its logic and subtleness
For the exact term “writing rules” as of
today Google bring us 337,000
references. Most of them opinions but
almost nothing about theories or
scientific approaches.
10 Writing Rules for Writing as of NYT,
New York Times: we recommend rules
1, 5 and 6: Listen to the Voice Inside
your Head, Study Sentences, and Write
With Non-Zombie Nouns and Verbs. In
Darwin Ontology this triad is equivalent
to “in mind” ideas, write using specific
and meaningful concepts resembling
famous “sayings” and avoid pomposity
and abstraction.
Towards the idea of WWD’s – II
What’s behind large thematic samples
Semantic Cores and Semantic Fingerprints
Let’s suppose that there exist a large enough
sample of “essays” dealing with the same idea
(diabetes) written for different people in different
ways along a span of time short enough to be the
idea consistent. Let’s also suppose that for each
“writer” the idea belongs to a semantic structure,
namely a “tree map” that as such has “upper level”
ideas, “lower level” or “derived” ideas and
“collateral ideas”.
To make the things more real and consequently
more complex we have to take into account that
writers have different writing talents, knowledge
and cultural levels.
As Darwin Ontology conjectures that writers tend
to write “specifically” and “properly” from a
semantic point of view our first step of analysis
should be to unveil these properties.
Towards the idea of WWD’s – III
What’s behind large thematic samples
Semantic Cores and Semantic Fingerprints
Remember that any document could be split in two
kind of semantic particles: “Common Words and
Expressions” and “Concepts”. In a first analysis we
may extract a “Concepts Core”, let’s say from 30 to
60 concepts that statistically could be considered
representative of the “in mind” idea our large
enough sample intent to define. These concepts
constitute the metadata skeleton of what Darwin
define as “semantic fingerprint” of the idea.
In the average good writers” write probabilistically
following a protocol (something like a WWD Well
Written Document formulae) guided by a master
principle: specificity. To be specific is to write using
the concepts core and only admitting (ex - post
analysis and checking) a small percentage of
semantic noise and/or contamination. At large
minimizing UP, DOWN and COLLATERAL concepts.
Towards the idea of WWD’s – IV
Is this message a WWD?
Yes, it is. Some expressions are
not only concepts but “sayings”
and “memes”, like “people are
hungry for change” that
deserves 1.300.000 references in
Google.
Let’s take a little pause
Before going a little deeper
This has been an introduction to a way of seeing the Web,
more and better. Probably many of you not only agree
about we have stated up to here but perhaps feeling a
little confused because a strong and rooted belief: what
you “see” now is the best available.
We, on the contrary state that we all may see more and
better perhaps a new Web dimension. We dared to
suggest you that in every Web document there exist its
somehow and sometimes hidden purpose, knowledge
and intelligence no matter if present explicitly or implicitly
focused or dispersed along the document layout. The new
dimension will allow us to retrieve from the Web not only
information and Knowledge but Intelligence Reports for
any subject.
Now we are going to try to present you this new vision
that will need of your full attention. Thanks!.
Our First Thinking
• We humans document, “at large”, statistically,
following a rather logical mathematical
“formula” we may name as WWD, Well
Written Document.
• As we will see soon with only two types of
semantic particles: “Common Words and
Expressions” and “concepts”. The first ones
make the literary filling meanwhile the second
ones the meaning.
Semantic Yin Yang
Common Words versus Concepts
As we have seen Darwin Ontology
imagined the Web as a dual world:
creatures that live within the Web,
K side, interacting with we,
humans, within K’ side outside the
Web core.
This duality goes deep within the
creatures (Web pages)
architecture; its logical structure
resembling a Yin – Yang Paradigm
where the Yin part is represented
by its “literary filling” and Yang
part as its conceptual counterpart.
The figure depicts the Yin part of a
New Zealand Technical page about
diabetes as unveiled by a Darwin
agent.
Semantic Yin Yang
Common Words versus Concepts
The figure depicts the Yang counterpart
of the Web page as unveiled by a
Darwin agent: all concepts are “seen”
like embedded within an implicit
hierarquically track as for example
[medicine…..[metabolic
pathologies[diabetes]]….].
In Darwin Methodology these Yang
counterparts that belong to the “Web
pages metadata, constitute also the
core of their “Semantic Fingerprints”,
in fact the essence of their meaning.
Detecting Concepts
In the Einstein
bibliography, from
Wikipedia, we, as humans,
marked in green what are
concepts as per our
criterion.
For each type of
document, language and
theme humans criteria are
transferred to Darwin
agents.
From time to time the
logic and precision of
transferring should be
continuously checked.
How we Document
WWD’s
WWD’s deal with specific
subjects like for instance “El e-
Commerce” in Spanish within
an agreed hierarchy: the root
idea => subject => topic =>
sub topic => …=> concepts,
followed by the text corpus.
The whole as a unit
corresponds to a given “in
mind” idea. Metadata within
the text corpus has, apart form
the necessary concepts,
common expressions like “sale
for moving” and eventually
important slogans known
today as “memes” like
“comprar y vender de todo”
(buy and sell everything).
Trees
You may also skip this section if you are
well acquainted with “graphs” and “logic”.
However do not discard devoting a little of
your time to see this series.
We humans continue being in part esoteric
and in part rationale. We all are attracted
by the beauty and harmony of trees and
tree forms and we all love to catalog things
along logical trees. However we do not
apply our coined along centuries tree
wisdom neither to order the Web not even
to order our Knowledge Curricula.
Trees
The Tree of Life
The Tree of Life (Biblical): As per the ancient Kabbalah,
each circle represents one of the ten Sefirot, or
emanations, by which the Divine manifests.
From ancient times we humans tended to represent life,
nature and the entity and primal substance we have
agreed to address as knowledge, as “trees”: those
creatures that from their roots tightly “enrooted” within
the earth point to the sky diverging in branches, nodes,
leaves and from time to time flowers - fruits.
Life seems to be a superior instance of the Divine above
knowledge that looks logical: at large eternity superior
to wisdom. However this axiology looks like anthropic,
apt for beings living over the surface of a planet.
Trees
The Tree of Life
Phylogenetic Tree of Life: also known as
Evolutionary Tree. We humans are
eukaryotes. Eukaryotes that are
defined by having cellular membrane
bound structures. We fall within the
Animals kingdom within the Eukaryote
Domain. See The Evolution of
Eukaryotes and our Human Story.
Most trees are depicted as going
upwards from “roots” to “leaves”. On
the contrary Computational Logical
Trees are represented as going
downwards with their roots up.
Trees
A Taxonomic Tree
Taxonomy, the science of
classification. We, humans are
obsessed with categorizing things.
Perhaps we need it in order to make
sense of our world.
The figure depicts a typical Logical
Tree or Semantic Skeleton we use as
Computational Trees, however
inverted. We may appreciate root,
branches, nodes, and the
uniqueness of paths that go from
any node to root and vice versa.
However in imperfect and “in
evolution” trees this uniqueness
fails and closed circuits appear.
Trees
The Beauty of Trees
Trees of Life: The Kabbalah Tree of
Life, The Yggdrasil of the
Scandinavian Mythology, The
Biblical Tree of Life, our Christmas
Tree, all types of Taxonomy Trees
, and the typical Oak Tree are
examples of beauty and
equilibrium and always a form of
symmetry.
Trees
A Minimal Abstract Fractal Tree
The figure depicts a fractal built with a very
simple recursive algorithm: “one opens in
two”. Fractals teach us how nature may build
“apparently” complex systems based on
simple laws.
The reverse is true: behind apparently
complex systems probably are hidden simple
form generation laws. This is key for our
Darwin Methodology: behind the apparent
chaos of the Web there is its rather simple
skeleton of its semantic structure
notwithstanding hidden.
Back to the beginning
Some examples of Knowledge Mapping
Now is time to go back to
our Darwin Ontology
schema in order to “see” it
better.
We, humans, the people, are
interacting the Web from
outside (green region) with
the Human Knowledge
hosted, apparently in chaos,
in the Web Ocean (black
region).
However the intelligent
semantic skeleton of the
Human Knowledge is “up
there” waiting to be
unveiled!.
Example I
The Diabetes sub Tree
The Diabetes sub Tree skeleton is
depicted as pending from Metabolic
Pathologies a medical specialty derived
from Medicine. It’s an inverted tree
coded downwards and from left to right.
The Diabetes sub Tree coded as 122 and
at its turn it opens in 1221 and 1222
and finally 1222 opens in 12221, 12222,
and 12223.
You may also notice the Diabetes
neighborhood, UP, DOWN and
COLLATERAL.
Example II
A World Art Tree Skeleton
In year 2007 a World Art Map was
developed as a Human Knowledge Map
demo to be the core of the Semantic Search
Engine of the European Theseus Project now
abandoned.
The whole Art information “as it is” in the
Web of that time (nearly 80 million Web
pages) was reviewed and semantically
synthesized by a family of Darwin Agents,
This synthesis took the form of an Art
Thesaurus with a Semantic Skeleton of
7.570 themes and nearly 300,000 concepts
along 13 levels.
In the figure the Rigoletto node as a concept
connected to the root via the 11 levels track
[0.1.2.2.2.2.14.1.6.10.5] is depicted.
Example III
Detail of The World Art First Level
In the figure is shown the upper level of the
World Art Map under the form of an Art Tree
Index.
Its five main sub Trees may be explored going
“vertical” along 12 more levels. Being this map
the core of a semantic search engine thru it
users may obtain trustable references in only
one click and a selected set of Authorities
specifically dealing with the “node” subject, for
example “street dance”.
All these maps have e-learning capabilities
either autonomously or controlled by humans.
From time to time their arboreal structure is
checked and Darwin agents and algorithms
continuously suggest nodes deletions, new
nodes aggregation, concept obsolescence, new
concepts, and even structural changes.
Example IV
How a Darwin Search works
HKM (K-side)
An ideal overview detail of its Upper Level
See next
HKM (K-side)
An ideal overview detail of its Upper Level
This is a fancy cover of a HKM, Human
Knowledge Map, showing for example its upper
level with 12 “vortexes” where users may dig
down searching WHAT THEY NEED in ONLY
ONE CLICK assisted by an intelligent wizard.
This map depicts the semantic structure of the
K region, namely the “Established Knowledge”
of our civilization at a given time.
You may imagine it as the cover of a New
World Library where as of today +30,000
million Web pages will be properly indexed
thematically no matter where they are hosted
in the huge Web Ocean.
HKM (K’ - side)
An ideal overview detail of its Upper Level
See next
HKM (K’ - side)
An ideal overview detail of its Upper Level
This is a fancy cover of a HKM, Human
Knowledge Map (People), of the K’ side of
Darwin Ontology, the People side, what
people know as a collective and as
individuals.
It will substantially differ from the
Established Knowledge of K side, more
ambiguous, diffuse, imaginative, ample
and why not a little chaotic!.
We depict here in a fancy way its upper
level arbitrarily split in 8 vortexes. This
information is crucial to make Web
Intelligence and to Detect and Infer
Trustable People Behavior Trends!.
42 - Search Engines - How to see the Web as Semantically Structured – I
43- Search Engines - How to see the Web as Semantically Structured – II
44- HKM - Its Physical Structure I
45- HKM - Its Physical Structure II
46- Concepts and “in mind” ideas I
47- Concepts and “in mind” ideas II
48- HKM – Antecedents I
49- HKM – Antecedents II
50- HKM – Antecedents III
51- HKM – Antecedents IV
52- HKM – Antecedents V
53- HKM – Antecedents VI
54- HKM – Antecedents VII
55- HKM – Antecedents VIII
56- HKM – Antecedents IX
57- HKM – Antecedents X
58- Darwin Ontology - Introduction
59- Darwin Ontology - Conjecture 0
60- Darwin Ontology - Conjecture 1
61- Darwin Ontology - Conjecture 2
62- Darwin Ontology - Conjecture 3
63- Darwin Ontology - Conjecture 4
64- Darwin Ontology - Conjecture 5 – Part I
65- Darwin Ontology - Conjecture 5 – Part II
66- Darwin Ontology - Conjecture 6
67- Darwin Ontology - Conjecture 7
68- Darwin Ontology - Conjecture 8
69- Darwin Ontology - Conjecture 9
70- 75 Epilogue
Darwin Vignettes
Search Engines
How to see the Web as Semantically Structured - I
Google like most conventional search engines are
semantically unstructured, “flat”. It means that they
do not index Web pages by “theme” but by words. It
also means that from the point of view of semantics
the actual universe of +30,000 millions pages are
indexed sharing the same “zero-ground” level.
Trying to see the Web like semantically ordered we
may imagine it like a Super Library Building of as
many floors as existing “semantic levels” (from 10 to
15). In the figure we have imagined the Web
structured as a Darwin Hypercube.
In the beginning, ex-antes of applying our Darwin
Methodology, all Web pages are at zero ground. As
long as Darwin agents and algorithms proceed each
Web page goes to its level and correct virtual place
(room, raw, rack, shelf,…) as hypothetically building
up the Hypercube.
Search Engines
How to see the Web as Semantically Structured - II
Building the Darwin Hypercube: its first step is to
unveil the Human Knowledge Map skeleton; Its
second step would be the Semantic Profile buildup
for each branch, discipline by discipline. As a probable
third step, perhaps initially, the metadata buildup for
the whole Web.
For each “subject” Darwin process “textons”, huge
aggregations of nearly 100,000 Web pages at a time
where from special algorithms unveils its “Subjects
Fingerprint”, a special metadata depicted at right ,
basically a string of concepts (k’s) weighted with their
mean density and a header.
As a fourth step each subject fingerprint is matched
within the subject neighborhood testing the Darwin
Ontology Conjectures.
HKM
Its Physical Structure I
How a HKM looks like “physically”? : As it
is content semantically structured along
logical trees or at least along somehow
arboreal graphs we may imagine a two way
reversal mapping between this content and
parts of a one-dimensional string , perhaps
something similar to genes in the
Genomes.
Talking of a HKM of 500,000 “Subjects”
belonging to 200 Knowledge Branches or
“Disciplines” of the Human Knowledge it
would resemble a small forest of 200
“trees” having from 1,000 to 10,000
subjects each one, 2,500 in the average.
Each of these trees could be mapped along
one dimensional memory, for instance
navigating from roots to leaves from up
down and from left to right .
HKM
Its Physical Structure II
In each of these 500,000 subjects – nodes we
should host from 20 to 50 “concepts”, in the
average 30, totaling a “Controlled
Vocabulary” of 15 millions concepts per
language.
In the figure at right we show a schema of a
HKM skeleton where subject S56 opening in
9 sub subjects is highlighted. We may also see
the neighborhood spectrum that depicts
how concepts focus within specific subjects
and how “diffuse” over their neighborhood.
Concepts and “in mind” ideas I
“In mind” ideas: we all have billions of “in
mind” ideas however hidden and protected
for ever unless we decide to share them with
“others” via gestures and written and oral
communication.
The Semantic Web deals with ideas and
“concepts”, ideas that that from Plato could be
shared and agreed about their meanings via
“explanations” and illustrated by examples.
Someone at right got an idea: how Conceptual
Maps may help us improve the learning. Let’s
suppose that at a given moment many people
have the same idea. How could them realize
they are “talking” about the same idea?. Of
course, even for a given language and for
native language people is not easy to agree
about that, isn’t?
Concepts and “in mind” ideas II
Along our evolution we learn to identify ideas
throughout “names” and for believers God is
the Almighty, the Pantocrator, the Verb, that
throughout naming creates.
The Web is the appropriate scenery to be
conscious of this power manifestation: at any
moment we may be statistically sure that any
idea is present with a name that compete
with others similar but with less presence.
In our Darwin Methodology we talk of
“modal names”. For example the Pope
Francis launched recently the six words
expression: “a poor church for the poor”,
pointing to a similar idea that probably more
than a billion of people have in their minds.
This name is modal, you may check that no
other compete with it to represent the idea
neither qualitatively nor quantitatively .
HKM – Antecedents I
The Figurative System of Human Knowledge , 1751 –
1772 also known as The Tree of Diderot and
d'Alembert is perhaps the first antecedent of a
Human Knowledge Map. We may also see
"Epistemological angst: From encyclopedism to
advertising“ a Google book from Robert Darnton,
University of California, Berkeley (2001).
It’s a master work of the Renaissance where
knowledge opens in three main branches: Memory -
History, Reason – Philosophy and Imagination - Poetry.
This tree of three main branches opens at their turn
in 145 disciplines.
HKM – Antecedents II
At right we may appreciate a schema that depicts the
project Consensual Map of Science that synthesized 20
existing maps of science in three basic forms, hierarchical,
centric and non-centric.
The chosen order of sciences (arbitrary) was: mathematics
physics, physical chemistry, engineering, chemistry, earth
sciences, biology, biochemistry, infectious diseases,
medicine, health services, brain research, psychology,
humanities, social sciences, and computer science.
HKM – Antecedents III
Propaedia, the Britannica Human Knowledge
overview is a semantic “index tree” of 130
Major Disciplines. It opens in 10 main branches,
41 sections and 167 divisions.
It’s the adieu of the Britannica at its 2010 last
print edition as they continue online. It’s an
heterodox outline of the Human Knowledge,
perhaps a master piece of Authoritativeness at
the classic style, defining the knowledge some
Authorities see instead of a consensual work
trying to agree about and define it as is. See
syntopicon.
Britannica Global Edition 2010 contained 30
volumes,18,251 pages, with 8,500 photographs,
maps, flags, and illustrations in smaller
"compact" volumes. It contained over 40,000
articles written by scholars from across the
world, including Nobel Prize winners.
HKM – Antecedents IV
Great Books of the Western World it is a series of books
originally published (1952) by Britannica, a package of
54 volumes. This work is based and influenced by the
Syntopicon (1952), Center for the Study of the Great
Ideas, an index of the Great 102 Ideas of the Western
World compiled by the American Philosopher Mortimer
Adler. From A to Z are listed below.
Angel, Animal, Aristocracy, Art, Astronomy, Beauty, Being, Cause,
Chance, Change, Citizen, Constitution, Courage, Custom and
Convention, Definition, Democracy, Desire, Dialectic, Duty, Education,
Element, Emotion, Eternity, Evolution, Experience, Family, Fate, Form,
God, Good and Evil, Government, Habit, Happiness, History, Honor,
Hypothesis, Idea, Immortality, Induction, Infinity, Judgment, Justice,
Knowledge, Labor, Language, Law, Liberty, Life and Death, Logic, and
Love, Man, Mathematics, Matter, Mechanics, Medicine, Memory and
Imagination, Metaphysics, Mind, Monarchy, Nature, Necessity and
Contingency, Oligarchy, One and Many, Opinion, Opposition,
Philosophy, Physics, Pleasure and Pain, Poetry, Principle, Progress,
Prophecy, Prudence, Punishment, Quality, Quantity, Reasoning,
Relation, Religion, Revolution, Rhetoric, Same and Other, Science,
Sense, Sign and Symbol, Sin, Slavery, Soul, Space, State, Temperance,
Theology, Time, Truth, Tyranny and Despotism, Universal and
Particular, Virtue and Vice, War and Peace, Wealth, Will, Wisdom, and
World.
HKM – Antecedents V
The Library of Congress Online Catalog , briefly LOC, contains
approximately 14 million records representing books, serials,
computer files, manuscripts, cartographic materials, music,
sound recordings, and visual materials. The Catalog also
displays searching aids for users, such as cross-references and
scope notes. The catalog records reside in a single integrated
database; they are not separated according to type of
material, language of material, date of cataloging, or
processing/circulation status.
This reservoir is related to others like The American Memory
of the LOC. Its Web growth concerning traffic, use,
sponsorships and initiatives looks like a little frozen since
2005. It also have an agreement with UNESCO to build a
World Digital Library.
HKM – Antecedents VI
The WWW Virtual Library (VL) is
perhaps the oldest Web Catalogue,
founded by Tim Berners-Lee, the
creator of HTML and of the Web itself
(1991) .
It is run by a loose confederation of
volunteers, who compile pages of key
links for particular areas in which they
have expertise. It looks like a litle
frozen since 2008.
HKM – Antecedents VII
DMOZ, initially Directory from Mozilla,
belongs to the ODP Open Directory
Project initiative, a human edited
directory built and maintained by a vast
and global community of volunteers,
nearly 100,000.
Its search engine is thematic registering
more than 5 million Websites in 1 million
categories .
One of its drawbacks is that volunteers
rule the directory evolution without
adjusting themselves to an agreed
ontology, that is “subjects”, “subject
names”, and even “authorities” come
from rather arbitrary suggestions from
people.
HKM – Antecedents VIII
Wolfram Alpha, is a Knowledge Base built with
curated and structured data extracted from the
Web about some branches of the Human
Knowledge. This database could be queried by a
search engine that directly try to answer queries
instead of the conventional search engines
procedure of providing a list of suggested links.
For many it provides a new way of searching: an
answer engine. For example asking for “Euclidean
algorithm” Wolfram Alpha tell us all math, logic
and computational information we “presumably”
need. On the contrary Google bring us 3,340,000
references telling us implicitly something like : we
did our work, now everything is up to you!. It focus
on science and systematizations of knowledge.
HKM – Antecedents IX
GeoNames: it contains over
10,000,000 geographical
names corresponding to over
7,500,000 unique features.
Beyond names of places in
various languages, data stored
include latitude, longitude,
elevation, population,
administrative subdivision
and postal codes.
This base is free of charge and
Its Web Services offer direct
and reverse geocoding. The
figure at right depicts
Geonames Ambassadors. You
may engineer this service with
some others like GoogleMap.
HKM – Antecedents X
Disperse Big Data: via Social Networks
“big data” reservoirs are now
accessible in a way extremely
atomized and dispersed, belonging in
our Jargon to the K’ side (people’s
side).
Notwithstanding with this raw
information, from a statistic point of
view, precise analysis and outcomes
could be obtained.
However this outcome is important
but not crucial to build the K’
Thesaurus and unnecessary to
improve the K Thesaurus.
Darwin Ontology
Introduction
Darwin Ontology is the core of Darwin
Methodology, a set of AI, Artificial
Intelligence procedures to “see more and
better the Web”. The Web as of today is
“semantically flat”: millions of content
pieces of different color -tonalities
(themes) are uniformly dispersed here and
there as conventional search engines like
Google only index by words, not
thematically.
Up: Take a look to a search track up trying
to guess something valuable about “green”.
Your queries are rather multicolor but you
learn fast focusing on green as times goes
on.
Down: With the whole Web content
semantically structured by “color”, logic,
math, physics, law, nature, philosophy
religion , entertainment…… searches go
direct, in only one click – guess.
Darwin Ontology
Conjecture 0
Conjecture 0: the triad Logical Tree,
Thesaurus, and Cognitive Objects
unequivocally identifies any type of
Knowledge .
We humans at large, statistically,
document “tree wise”, generally reverse
mode from root to leaves, specifically from
general and global to particular, “talking”,
“teaching”, “convincing”, “registering”,
level by level. This content hierarchy is
statistically unveiled by Darwin agents and
algorithms.
Knowledge trees have a “skeleton” of
nodes (Logical Tree), a “Controlled
Vocabulary” (Thesaurus) as a list of node
names and the Semantic Fingerprints
within each node (Cognitive Objects).
Darwin Ontology
Conjecture 1
Conjecture 1: Website Administrators and Owners “speak”
and “think” rationally in terms of their objectives and in terms
of their matchmaking policies.’’
We say rationally because they behave like thinking along a
tree and speak like automata answering users’ queries at
governors versus governed mode. As a matter of fact things in
Darwin K Realm (black region) are “fossils”: laws, regulations,
codes, prescriptions, are the consensual truth at a given
moment inherited from the past.
K mission is to cancel, moderate or at a negotiation extreme
make changes to adapt better the established order to claims
coming from K’ (green region). In K things are frozen at a given
time, for instance “as of …”. That’s not bad!. It’s the eternal
game of evolution. In K’ life is continuously brewing, the new
ideas, suggestions, opinions, all came from K’.
Warning: In order to flow freely information and intelligence
both sides should be semantically structured (mapped). With
K structured and K’ unstructured it is impossible to infer
People’s Behavior Patterns.
Darwin Ontology
Conjecture 2
Conjecture 2: Users “speak” and “think” rather
chaotically in terms of their passions, desires, their
necessities at large.
Users need “solutions”, prescriptions that supposedly
are stored and publicly offered in K Realm, in most
cases openly and free. To get them the only tool they
have at hand are symbols, buttons, links, and at large
strings of meaningful words of K side. So they are
obliged to know precisely the K jargon to succeed.
A smart observer, looking from a virtual “e-
membrane” at the owners’ side, may arrive to the
conclusion that any offer has a well defined purpose
and that its “hunting” strategy could be precisely
inferred.
Warning: On the contrary without K – K’ mapping it
would be practically impossible for any observer
located in the same place to infer any type of order at
the users’ side.
Darwin Ontology
Conjecture 3
Conjecture 3: Users’ interactions along sessions are
strings of semantic particles of two types, users’
keywords, and navigation instances. The sessions’
strings are the representation of the users’ strategies
to satisfy their needs.
Users sessions’ tracks of the form [iikikkikiikkiii…..],
where k stands for keyword and i for instance, even
ignoring their related outcomes, provide us a primal
source to evaluate our offer and to infer users’
behaviors without interfering with them!
These tracks could be considered primitive
expressions of the users’ searching Jargon.
Darwin Ontology
Conjecture 4
Conjecture 4: Cognitive Objects, documents, are expressed
as strings of two semantic particles or molecules, Common
Words and Expressions belonging to a given Jargon, and
concepts belonging to a “Controlled Vocabulary”.
Semantic Yin – Yang: See our PPT series 14 thru 18 slides.
Like in the Yin- Yang monad we humans document via two
types of particles: one to communicate the content essence
of our “in mind” ideas (Concepts) and other for literary filling
to make our documents more comprehensive and friendly
(Common Words and Expressions”).
One thing astonished me in the past when dealing with
newspaper marketing people was that they talk of the
“cognitive” content as “blank”, only attractors of a physical
media to carry ads and businesses. On the contrary for
regular users like you and me, the newspaper is a media that
reduces our ignorance, our uncertainty, and ads just filling.
Darwin Ontology
Conjecture 5 – Part I
Conjecture 5: It is possible to enable a Full
Duplex Type communication between
Websites and their users throughout an e-
membrane, enabling the free flow of content
and its associated intelligence between them.
In Darwin architecture interactions are
performed through an e-membrane that
resembles a living semi permeable
membrane where all the messages going
forth and back through it are processed trying
to extract as much information as possible
from them.
Darwin Ontology
Conjecture 5 – Part II
When the message is a potential concept
coming from users’ side, the membrane
performs all possible statistics at both sides,
(green and black) : it accounts for K Thesaurus
concept use at owner side, makes a request to
offer all documents available, and accounts for
the user history.
Besides these statistics keyword are analyzed
and matched versus a K’ Thesaurus. Each
keyword within a session is potentially
considered as part of a “speech” or part of a
sort of interception “truth game”.
The core of its “smart” built in strategy is to
keep it from spying users. No cookies, no brute
“bail and catch” hunting devices. Total absence
of owners’ side messages such as: Come here
again!, You are Welcome, and even the mild
May I help you?.
Darwin Ontology
Conjecture 6
Conjecture 6: Intrusions in communications
cause serious troubles that go deeper and
farther than a local perturbation. The
slightest intrusion may make invalid not only
the session but prevent users from
communicating freely. They distort statistics
and the users’ strategies as well.
More than a conjecture it could be
considered a fact, something that has gained
consensus. However effects and counter
effects of surveys and polls abuses should be
investigated and measured.
Conventional surveys and polls are based on
intrusions, sometimes generating visible
harassment. These intrusions condition the
answers. How much?. It has to be measured
in order to eradicate it as much as possible as
a credible methodology to know the K’ side.
Darwin Ontology
Conjecture 7
Conjecture 7: Human Knowledge is bounded.
We talk about computational ontology at a given moment in
a given language. Rather huge but “numerable” within our
actual “state of the art” of computing, namely: from 10 to
15 millions of “in mind” ideas expressed as concepts, from
350,000 to 500,000 subjects or themes that deserve study
under the form of essays, thesis and books, distributed
hierarchically along a “forest” of Human Knowledge
Disciplines, from 170 to 200.
To this basic semantic asset Geographical Names, Art
Collections, Acronyms and ephemeris and historical data
should be aggregated. Finally, huge but bounded and
actually efficiently computable.
Darwin Ontology
Conjecture 8
Conjecture 8: Given a Logical Tree (skeleton) we may
generate automatically its related Thesaurus
To make this conjecture valid we need at least a cluster
of 100,000 “Authorities” per Logical Tree, for instance
Medicine.
To accomplish this heavy task Darwin Methodology
proceeds in a series of steps, namely: From the
skeleton we extract a flat Thesaurus seed; with this
seed we extract our first knowledge base; from this we
extract a whole flat thesaurus, not a seed; with this
flat thesaurus a better knowledge base; finally with
this knowledge base guided by the Logical Tree we
structure the whole thesaurus computing all possible
tracks from root to the specific nodes.
Darwin Ontology
Conjecture 9
Conjecture 9: Given a Historical Reservoir we may
generate its related Thesaurus and a collection of its
main Subjects and Themes.
See ours slides of the Series How To See The Web as
Semantically Structured. In fact to unveil a Thesaurus
from the Web is a complex procedure as a series of
steps that resembles industrial refinery processes
trying to obtain in each step lighter and subtler products
(as we go from root to leaves). The key are good seeds.
Take into account that seeds are imagined by humans
based on agents suggestions. Once a seed is pre
selected Darwin starts its grow up to medium size.
Wrong seeds rapidly guide to anomalous results
meanwhile good ones evolve coherently maintaining
the arboreal form. Good seeds are grown up to full size
however its coherency under continuous monitoring.
Finally full size solutions are submitted to a Human
Board of Experts for approval.
Epilogue
Darwin a cutting edge technology
 The Web as_is of today is semantically
unstructured. Darwin Ontology make it possible
to map the whole Web unveiling the thematic
essence of each of its + 30,000 millions of pages.
 This map behave like “Semantic Glasses” that
enable users to see the Web as “semantically
structured”, without interfering with it.
 The map enable users to FIND WHAT THEY ARE
LOOKING FOR IN ONLY ONE CLICK even from
Mobile units.
 With the Web semantically structured it is
possible the Inference of People Behavior Trends.
 Darwin Ontology enable humans to detect and
retrieve all pieces of information and intelligence
disperse in the Web about ANY THEME and
synthesize through it precise and meaningful
Intelligence Reports.
By Juan Chamero, Darwin Architect, August 12, 2013
Epilogue I
The Web of Today – The Web of Tomorrow
Darwin Ontology is the core of Darwin
Methodology, a set of AI, Artificial
Intelligence procedures to “see more and
better the Web”.
The Web of today is “semantically flat”,
unstructured: trillions of pieces of
information (different color –tonalities by
theme) are uniformly dispersed here and
there because conventional search engines
like Google only index by words, not
thematically.
The Web of Tomorrow is depicted down as
“semantically structured” along “Trees” of
different colors - tonalities, one for each
Branch of the Human Knowledge (~200).
Epilogue II
Semantic Glasses
UP: The Web of Today looks unstructured,
disordered, as_is. Darwin agents as “Super
Librarians” inspect, from a semantic point
of view, all pages inferring their thematic
and building their corresponding
metadata onto a HKM, Human Knowledge
Map . Along this process Darwin agents
do not make any intrusion, everything
except the map remain untouched.
DOWN: the user may “see” thru
“Semantic Glasses” The Web of Today as
if it were The Web of Tomorrow. To do
that the “Semantic Glasses” needs the
map and the help of a Super Librarian
working throughout an e-membrane.
Epilogue III
Find What You Are Looking For In Only One Click
UP: Take a look to a search track up trying to
guess something valuable about “green”.
Your queries are rather multicolor but you
learn fast focusing on green as times goes
on. However you obtain a very dispersed
and atomized green.
DOWN: With the whole Web content
semantically structured by “color”, namely
logic, math, physics, law, nature, philosophy
religion , entertainment…… searches go
direct, in only one click – guess. Everything
you obtain is green and valuable!.
Epilogue IV
Inference of People Behavior Trends
Since ancient times many people live making and selling
inferences about everything mainly about the future. By
first time we humans have at hand a model of how we are
and think: The Web. As an example the figure depicts a
work about “10 Trends for 2013” selected at random by a
Darwin Agent as of August 2013 querying Google by
“social behavior trends”.
Darwin Ontology states that in order to make consistent
inferences the Web should be semantic. Darwin solve this
conundrum building “semantic glasses” that enable us to
see the Web as perfectly structured.
However Darwin also states that to make consistent and
credible inferences we also need to see the people’s side
semantically structured. It means mapping the whole Web
users’ demand. Darwin may build both maps.
Epilogue V
Intelligence Reports
Darwin Ontology enable humans to detect
and retrieve all pieces of information and
intelligence disperse in the Web about ANY
THEME and synthesize them in precise and
meaningful Intelligence Reports.
This “anything” could be an “avatar” or sum
of existing information and knowledge about
a person, an organization, an entity, a
country, a region, a situation, an area of
activity, etc.
The figure depicts a Darwin agent building a
facet of an avatar, for instance its biography.
The Web is browsed cluster by cluster
inspecting “Authorities”. It “comes” from
cluster 10, scouts exhaustively cluster 11 by
saving words, expressions and potential
concepts (k’s) and continue to cluster 12.
Darwin Presentation to the XXXXX Institute
By Juan Chamero, from Barcelona, as of January 2015
Street Art Utopia, by David Walker -Juan Tuazon
The Art Map
• The Art Map, as a piece of the HKM, Human Knowledge
Map, was built for the EU, European Union as a demo of
the “semantic” talent of Darwin Methodology to retrieve
out of the Web_as_is all the information and intelligence
disperse here and there from the past to present.
• The knowledge creatures you are going to see,
semantically structured by Darwin, were “up there” in
the Web Ocean uploaded openly and at will by millions
of artists and laymen.
• That map was uploaded to the presentation laptop acting
as virtual “semantic glasses” of the Google browser.
1. A three upper levels vision
Take a look to its seven main clusters. Within each cluster the
main art subjects are depicted and hyperlinked like in a geo map.
The Art Tree is deployed from root to leaves in up to 13 levels.
2. Mouse over the node “Drawing”
The tree could be navigated either by track or by neighborhood
(white links). Mouse over “Drawing” depicts its corresponding
sub tree.
3. A detail of the upper level “Drawing” sub tree
This is a sub tree of 80 derived nodes. For each node we may have
access to a gallery of images (5 in this demo). Optionally and
tentatively we may inspect nodes and their content without clicking.
4. Knowing a little more via Google images thru
Darwin Semantic Glasses
We may inspect specifically related images via intelligently pointed
suggested queries, for example “Leonardo vitruvian man”.
5. Knowing a little more via Google images thru
Darwin Semantic Glasses
Darwin makes another suggestion: Leonardo da Vinci Drawing Machines.
6. We are invited to search “pen and ink” works
within “artist tools”
Intelligently guided, at will or randomly we are invited to inspect popular nodes
7. Knowing a little more querying by the concept
“masters of drawing” via Google Web
Darwin may focus semantically deep: see the NOT TRIVIAL specific query
track from root to leaves: “arts” “visual arts” drawing “masters of drawing”.
8. Deepening a step querying by the concept
“Leonardo da Vinci” via Google Web
See how Darwin generates an optimal query to find the most
authoritative documents , some links of the semantic track are
suppressed, some as open search and some as closed search.
9. Similar as above querying by Michelangelo
We are still “inspecting” the map before deciding to “click”. As Darwin maps
evolve by themselves users may suggest other options off tree.
10. Let’s go now to the “Drawing” neighborhood
As explained we may navigate either at sub tree mode or by
neighborhood. We invite you to make click on neighborhood white link.
11. Let’s see what a neighborhood is
The figure depicts the Drawing node neighborhood: its ancestry:
Painting: Classic; its seven “sons”; its many “peers”, or brothers, etc.
11 bis. Semantic Neighborhood and Tree Logic
By “semantic neighborhood” we mean the pertinence or membership to a “semantic family”, in this
example to “the drawing family”. So drawing is the second son of “classic arts” and brother of
“painting”. It has many “brothers” as sculpture, and architecture and several “sons” or subordinated
subjects such as “history”, “artist tools”, “support media”,…. Trees and sub trees may also offer access
to arbitrarily agreed forms of extended families including collateral subjects at the level of “uncles”
and “aunts” and even closely related and/or friendly and/or akin subjects.
Next we are going to explore how data is structured in a sort of database. One of the problems we
face when dealing with “Big Data” applications (and this is one!) is how to offer friendly and efficient
interfaces to navigate and at the same time providing overall visions and up to the minimum detail as
in geo maps. As we will see a HKM, Human Knowledge Map in a given language, must map more
than 15 millions “ideas” along more than 600,000 subjects (themes or topics) finally structured as a
“knowledge forest” of about 200 disciplines. Take into account that The Art, only a small piece of it
notwithstanding “complete”, has 7,571 subjects and about 500,000 “ideas”.
12. Let’s see a little deep inside Darwin
The Art Map content could be saved and deployed resembling a DNA vector along a two
dimensional matrix, in this demo of 23 columns by 329 rows. Concerning the whole HKM,
Human Knowledge Map it would be saved and deployed in 23 columns by approximately
30,000 rows, ~690,000 subjects, not too much in terms of “Big Data”! Within each “semantic
cell” that have specific and unique name is hosted the “semantic fingerprint” of the “subject”
pointed by the name, a brief description of it and a set of “authoritative sources” where from
Darwin agents retrieved the description (i-URL’s).
13. Passing the mouse over “painting”
The deep level of browsing: Imagine yourself browsing The Art tree by track
from root to leaves and going from right to left and in parallel creating the
7,571 cells from upper left corner of the matrix, going right and down row by
row unfolding the tree in a rectangular matrix.
We may go now to browse the whole map cell by cell and even within each
cell reaching a semantic universe of ~500,000concepts!
14. Let’s inspect the “interior” of a given cell:
“lyric soprano”
We are interested to know as much as possible about “lyric soprano”.
Darwin tell us that The Art map has a node named “lyric soprano”. The
demo is adjusted to search by nodes first and then by concepts.
15. Let’s go to “lyric soprano” cell and its
neighborhood
16. Mouse over Lyric Soprano again ….
Doing mouse over the Lyric Soprano cell activates the same
search features as in slide 2 and subsequent: sub tree of the
node and its neighborhood complemented by a gallery of images.
17. Node content: i-URL’s and Semantic Fingerprints
We are on the deepest semantic mode, within the node! A whole
“Semantic Fingerprint”, a brief semantic description of the node subject
and more …….
A new Dimension of Searching
• We are entering into a new dimension of searching: This “feature” is not only a powerful
tool to make the search more direct and precise but a tool to find whatever we need in only
one click as well. It is equivalent to being in a huge Web Library managed by expert and
friendly librarians.
• We have said that in each node is stored something like a sample of its subject. Let’s
suppose that for the subject “masters of drawing” there exist 20,000 Web pages dealing
with this subject with a high level of authoritativeness. Darwin Agents under Darwin
Methodology and guided by Darwin Ontology may unveil from these raw clusters of content
a weighted set of dominant concepts (modal concepts) that are considered the semantic
synthesis of the cluster subject: masters of drawing in this case. This set of concepts is the
core of the above mentioned “semantic fingerprint”.
• You may easily guess that adding one of these specific and unique concepts to your
“querying” it will focus precisely on the semantic key you are looking for! We are close to
the “find a needle in a haystack” utopia.
• You will be now invited to see how this feature works. Darwin agents will also generate for
each subject its corresponding description expressed as part of its metadata (i-URL).
Concepts could be of several types: generic, objective, functional, etc.
18. List of concepts stored in “Lyric Soprano” node
You are invited to make click, perhaps your first click along
this demo: making mouse over will provide you only semantic
overview. In order to be specific and going right to the point
you must make a click: in the average no more than one!
18 bis. List of concepts stored in “Lyric Soprano” node
scroll down ……..
Each of these true set of concepts, objective (famous lyric sopranos)
and generic (fach, full lyric, light lyric, range, …..) and some other
categories add specificity and excellence to your search.
19. Examples of specific search in only one click
Epilogue
The LOGO is behind the Web!
At last: It is all in a name!
•Human ideas are known by their
unique “agreed” names for each
pair “Language – Culture”.
•The Web enabled this unique
agreed naming.
•Darwin unveils these unique
agreed naming.

The Web - A World of Avatars

  • 1.
    The Web asa World of Avatars ANWOT - A new Way of Thinking By Juan Chamero, from Barcelona, Spain, as of May 30th 2016 Avatar (2009 film), from Wikipedia Introduction We present here the last version of Darwin Methodology initially created to “see the Web more and better” that evolved to see the Web as a World of Avatars instead, cyber creatures that represent our past and present ideas and thoughts and even all type of intellectual speculations about our possible futures. This idea is not new: it goes back along centuries diluted and hidden as archetypes and models like “Romeo and Juliet”, “Don Quixote”, “Ulysses”, “Democracy”, “El Príncipe”, the Avatars of Hinduism, and actually as Cyber creatures by visionaries and scientists like Stephen Hawking. Why was it hidden for so long? Because only from very recently exist suitable cyber reservoirs to host The ALL almost “naturally”, openly and freely: The Web. This presentation could be considered our third e-book of the Mind to Digital Series. It has five sections namely: i. Darwin Methodology Last Update deals with the idea of Web avatars fundamentally the new Darwin Ontology Conjectures to cover this revolutionary vision; ii. Darwin in a nutshell is a synthesis of Darwin methodology in order to see the Web as semantically structured however not enabled yet to see it as a world of avatars; iii. Darwin Demo deals with the details of The Art Thesaurus unveiled from the Web via a Darwin AI Mega Algorithm and presenting Darwin as ANWOT, A New Way of Thinking; iv. Darwin Project stands for Darwin as a Project to map and unveil absolutely EVERYTHING disperse and hidden in the Web; v. Darwin Tests and Reflections deals with some paradox and crucial tests performed by Darwin Methodology along the last decade and some Web examples about concepts versus words architectures semantic search superiority (See Kentucky Woodman!).
  • 2.
    Darwin Methodology LastUpdate  Darwin Ontology Adjustments, 1 page;  Darwin avatars buildup in 12 big “industrial steps” analogy, 4 pages;  Darwin Methodology – To “see” the Web more and better, avatar seeds, 12 pages;  Darwin Ontology – Conjectures, 4 pages; Darwin in a nutshell  Darwin in a nutshell Index, 1 page;  Darwin Brief, 1 page;  Darwin Methodology, 2 pages; o Darwin Carousel, 5 pages; o Darwin Maps Build, 3 pages; o Darwin Big Data, 3 pages; o Darwin Icons Meanings, 5 pages;  The Web as seen by Darwin Methodology, 3 pages;  Darwin Bibliography, 1 page; Darwin Demo  A picture is worth a thousand words, 1 page;  Intro to Darwin Art Map, 6 pages;  Darwin Mapping History, 9 pages;  Darwin Semantic Search, 19 pages;  Q&A Logic of Web Search, 9 pages;  The Art Tree Darwin Demo, 13 pages;  Darwin Presentation (PPT), 25 pages; see pages 284 to 308; Darwin Project  Darwin Teaser, 11 pages;  Darwin HKM (PPT), 75 pages; see pages 209 to 283;  Present and Future of Web Searching, 4 pages;  DM Mega Algorithm, 4 pages;  Aiware Methodology, ikAK, 3 pages;  Semantic Pills, within a Big Data Thesaurus, 35 pages;  HKM Synthesis, HKM in numbers, 4 pages; Darwin Tests and Reflections  Wikipedia avatar, 3 pages;  Word Searching Weakness, 1 page;  Differences between data information and knowledge, 3 pages;  The Web for fun, “who’s on first?, 2 pages;
  • 3.
     Crucial questioning,4 pages;  Kentucky Woodman, 3 pages;  Human Knowledge Disciplines, 13 pages;  Words versus concepts, 5 pages;  Mathematics seed, 7 pages; Epilogue We intended to depict in this e-book a semantic tour around the Web using as a “cicerone” our Darwin Methodology. We have “seen” semantically and at our will hundreds of thousands of Websites related to our needs of data, information, and knowledge and even of intelligence. As a consequence of our guided e-learning we acquire a valuable cyber wisdom we want to transmit. If we were challenged to explain shortly the rational of our alleged “acquired wisdom” we would recommend overview our Darwin Tests and Reflections section thru a mini tour as follows:  Wikipedia avatar: it shows us the best we can do working conventionally, at large subjectively via real or alleged authorities;  Word Searching Weakness: it shows the intrinsic weakness and misleading ambiguity of a trivial search like for example “dog”;  Differences between data, information and knowledge: in fact at present we ignore what are these differences scientifically talking. Notwithstanding we consider the above hierarchical sequence a strong and valuable belief. There are hundreds of alleged authoritative versions about it such as the one commented;  The Web for fun: we use one of the most famous Abbot & Costello “routines” who’s on first to exemplify the semantic confusion generated by bad and/or incorrect use of words;  Crucial questioning: here we present the hardest Artificial Intelligence experts questioning about Darwin namely: Darwin versus Google search; successfully high impact and/or disruptive applications uses examples; Darwin ability to work within the Dark Web;  Kentucky woodman: a semantic analysis of the term associated to Abraham Lincoln as avatar unveiling the Web as_is departing from Zero Knowledge;  HK, Human Knowledge Disciplines: A whole Web Thesaurus would cover about 200 disciplines. We have arrived to that estimation that upgraded from 150 at the beginning of the 2000 depending of what we mean by “branch of knowledge”. See a brief exploration about it as of 2014;  Words versus Concepts: it is the summary of a Darwin workshop seminar held in 2015 about word versus concept and their associated universes and in mind images;  Mathematics seed: example of a Semantic Seed buildup performed by Dr Eduardo Ortiz and its team of PhD postulants about Mathematics and tested as trustable by Darwin agents. Dr Ortiz is emeritus professor of Mathematics and History of Mathematics at the Imperial College of London.
  • 4.
    Darwin Ontology Adjustments ByJuan Chamero as of February 19th 2016 The Al is mentally imagined (“in mind image”): Keeping it in mind – Poetry By Heart, Oxford Dictionaries Prologue This brief document deals with crucial Cyberspace fundamental findings. Documents reviewed have been biographies and classic essays, to my knowledge, related to the scope of our ontology namely: Galileo Galilei, Claude Shannon, Alan Turing, Roger Penrose, Albert Einstein, Stephen Hawking, Plato, René Descartes, John Von Neumann, Nikola Tesla, Jaime Balmes, The Tao Te Ching book, Zen writings nucleated around Bodhidharma, The Bible: Genesis, The Apocalypses and why not something about The Pope Francis, Saint Augustin, Teilhard de Chardin and Umberto Eco, an atypical intellectual cocktail isn’t ?! Because these rehearses surged a new updated Darwin Ontology to “see the ALL more and better”. Perhaps this ALL be the common mind image the intelligence cocktail components share: Rational: a) Our mind process continuously information and knowledge; b) this help us to live more and better; c) via intuition and knowledge humans document their acts and experiences, let’s say their knowhow and living avatars. Stephen Hawking states that these registrations (for example in books and in the Cyberspace) would be as important as our lives. As a preliminary thinking he suggests that the image of the whole world as_is as today and probably seeds of our future could be expressed in the Web. Provided this assert is true we would be less of what we really could! (And even without recurring to God). Note: does these assertions sound a little as science fiction for you? It would be equivalent to say that via ontologies like Darwin we are enabled to know not only the best possible truths but the absolute best ones! And all this without creating nothing new at human or artificial Intelligence level, simple unveiling all the pieces of truth that somehow are disperse and semi hidden in the Web: almost a paradox of negentropy! The adjustments have been synthesized in a 4 pages document denominated “avatar_buildup”. Its first page deals of Darwin as unveiling avatars diluted in the Web. Within this vision a Darwin outcome as the Human Knowledge Map would be simply a Human Avatar, big one and complex but at last an avatar! The second and half of the third page is devoted to the epilogue that resumes the Darwin Ontology adaptations performed in order to unveil not only what is actually visible but what is actually hidden or invisible as avatars. These adjustments close our ontology with a finishing touch! => Back
  • 5.
    Darwin avatars buildupin 12 big “industrial steps” analogy ANWOT, A New Way Of Thinking doc, by Juan Chamero, from Spain as of February 15 th 2016 Face avatars from Deleket Introduction Avatars: cyber creatures that represent and guide, generally as archetypes, our lives. If we as humans have defined our reason for living and our relation with THE PERCIEVED ALL throughout the cognitive hierarchy data, information, knowledge and wisdom, we may assume that the avatars are creatures that represent and help us to make a meaningful use of this hierarchy to solve and/or “to see” meaningfully any imagined subject. Avatars are in mind images however not all in mind images are avatars. Avatars are then creatures that in some extent represent us, our existence, our past and what we expect for our future as well. Let’s imagine a world without humans but with the Web space “alive” as it is now: a sort of Cyber Sea where live trillions of in mind images and avatars! We are used to see and to understand all type of in mind images via trillions of explanations and descriptions documented in all the existent languages and cultures. As eventual intelligent “aliens” we may imagine well how insects behave, how illnesses evolve, what a storm is, and even what are more abstracts and complex things like hate and love. In Hinduism an avatar is an incarnation or deliberate descent of a deity or Supreme Being to Earth. In Darwin Ontology it is supposed that in the Web co-exist all imaginable avatars, for instance The Pope Francis avatar as of today would be a creature that resume absolutely EVERYTHING we as humans “see”, as related to the pope investiture, to the Catholic church and to its avatars along time, that is to say to Jesus, to the Saints and prophets, to all type of passions from the compassion of Jesus to the tortures of the “Santa Inquisition”, to all types of adhesions – rejections from atheism and agnosticism to the highest apologies of faith, nor leaving outside the human Jorge Bergoglio as a person and his family and entourage. Darwin Ontology enable us “to see” more and better the Web, and of its creatures, something similar to the Galileo Galilei exploration of the sky thru its telescope. It enable us to unveil not only all the disperse pieces of information and knowledge of all avatars but also the hidden intelligence that maintain them united as entities. Within this paradigm would also be avatars all the big problems of the humanity. Let’s imagine now we are commissioned to build a big and complex avatar like for instance Barack Obama, the Pope Francis, the Refugees Problem, The EU Future, the Terrorism, the Democracy Evolution, the Genre Violence, etc. Do not discourage everything is “up there” disperse but hidden in the Web Sea!
  • 6.
    Avatar Buildup Step 1– Semantic Exploration: Perform a first approach to the avatar: review the Web content with our mind focused in finding names, traces, features, images, audios, text samples, memes, tags, collectives, visible authorities; defining a first set of semantic axes of an hypothetic avatar “semantic seed”. Step 2 – Semantic Resonance Exploration: Unveil the “names” of the first approach: each in mind image has, for a given pair language – culture, a sort of “resonance name” like radio waves: asking conventional search engines by these names they point to the best answers, in quantity and semantic quality. For instance exploring the semantic neighborhood of these best names you may experiment significant changes with minimum and/or negligible written or pronunciation differences. Resembling humans criteria Darwin agents detect the best names for a given in mind image. Step 3 – Identifying suspected Authorities realms: Pivoting and exploring websites content around resonance names, for instance building hyperlinks versus hyperlinks matrices, will provide us raw data for next step: namely chains (including closed loops) of meaningful related hyperlinks names semantically weighted. Step 4 – Unveil possible “conceptual graphs”: basically a human task: an expert or a group of experts analyze the unveiled raw data looking for conceptual graphs to feed next step. At this step the global structure of the avatar should be depicted and properly documented. Step 5 – Unveil possible “semantic seeds”: basically a human task guided and aided by agents. Step 6 – Select the best semantic seed: basically a human task. As the “hard” process, computationally talking (+85%) begins next step, the selection must be backed up and justified as much as possible. Step 7 – Make the semantic seed grow: there are many ways to make the semantic seeds grow “properly”, however under the same ontology, depending of the nature, complexity and size of the avatar. One of the simplest ways is to expand the initial names realm (the one that backed up the semantic seed unveiling), let’s say from 50 to 100 names to a realm of a few thousands. For each semantic seed name it is unveiled from the Web a “texton” a sort of logical huge vector built of pieces of content of the 500 to 1,000 top websites retrieved. Darwin methodology states that within this huge sample it is meaningfully represented the avatar, basically all its derived names and their logical probably locations within its logical semantic skeleton. Darwin algorithms and agents make all the computations but humans are responsible of selecting criteria of naming expansion and locations adjusting. As an outcome of this step we have our avatar predefined but fuzzy as seen through a cloud: for instance as a set of 1.200 names related to the avatar but still poorly structured along a conceptual graph of 500 nodes. Step 8 – Specific Concepts unveiling: Darwin Ontology states that we humans, as a collective, document according to a probabilistic “formula” (something like a WWD, Well Written Document formulae) using only two kind of semantic particles: Common Words and Expressions and “specific concepts” closely related to the in mind image we have in our minds about the main document subject. This specificity acts as a semantic filter that aid us to see more and better the semantic skeleton of the avatar. Following our example we arrive to a name realm of 1,500 terms and to a semantic skeleton of 450 nodes.
  • 7.
    Step 9 –Check the Ontology Conjectures accomplishment: This is a necessary, hidden and heavy task mostly performed by Darwin algorithms. We have to take into account that only long lasting and complex avatars tend to structure like logical trees, like for instance Maps of Knowledge. Most avatars will have semi arboreal structures however in part resembling directed graphs: this directionality enable us to activate semantic ancestry and all types of parental relations via conventional search engines, for instance that “popularity” rank tend to be higher with ancestry. Step 10 – Evaluate the whole process and results: We may arrive to this step many times when building complex avatars as a checking point of an iterative process that runs from step 1 to here. Step 11 – Intelligent Report for Humans; First raw avatar synthesis: This is a human task. It resembles an essay or a book editing with its corresponding Prologue, Epilogue, Introduction, Abstract, Index, Bibliography and very important: its metadata structured as [avatar definition, Authorities and their profiles, semantic and Web references, images, videos and audio, selected quotations, tags and memes]. Step 12 – Avatar buildup: it is a continuous task because avatars evolve and at the same time we evolve in a sort of exponential e-learning process. Along circa 8,000 years the human being built something like an avatars world library somehow ruled by a world plutocracy in a rare pairing between the wisdom of an insignificant minority and the disproportionate power hold by another insignificant minority. At large the Established Knowledge, the best truths, were those issued by geniuses, illuminated and powerful people and entities. However the best truths should take into account the in mind images of information, knowledge, opinions and why not the wisdom of we ALL humans as a collective of unique individualities. Now after more than 80 centuries it is perfectly possible! Epilogue Darwin Ontology defines the life and interactions of Web Cyber creatures as a dual interacting scenario depicting: The “K Side” or World of the Established Knowledge versus the “K’ Side” or World of the People. Daily life avatars are usually hosted in K’ Side meanwhile formal long lasting avatars are generally hosted in K Side. Avatars popularities: as per Google we may distinguish avatars (pointing to 419,000,000 References) as a single word concept and as the core of the expression "the avatars of life", (as a closed search within quotation marks pointing to 54,000 References). Curiously the Spanish expression “Los avatares de la vida” collects 233,000 References perhaps because within Spanish literature the term is misleadingly used as synonym of circumstances. Some definitions for avatar: “learning a second language” avatar draft written by students of the USC, US. o The incarnation of a Hindu deity, especially Vishnu, in human or animal form. o An embodiment or manifestation, as of a quality or concept: o An icon, graphic, or other image by which a person represents himself or herself o A digital construct (often an image file) that represents the online user in a virtual world. We invite you to see a pre avatar buildup around the subject “How the world see us” restricted to Spanish students of a second language, probably English, in an American university. As you may easily appreciate it is incomplete and rather biased: authors “take side” openly and too frequently.
  • 8.
    What’s life? Whatwe present as avatars sounds a little disrupting a strange combination of knowledge and intuition because these “virtual” creatures – avatars – could be all and nothing and for some cosmologist like Hawking more alive and transcendental than humans. We invite you to imagine what’s real about them in physical terms, matter, namely space and mass within the whole known universe: almost nothing, under all cosmic scales close to absolute zeroes. Note 01: Let’s try to imagine all forms of life distributed and diluted, in the average (for example in a ratio 1:1000) over the layers of the biosphere, from the superior atmosphere and going deep a few hundred meters below surface, as compared to the Earth radio of 6,378,000 meters. It gives us almost zero mass respect to our planet, probably the only one with suspected life within our galaxy! In Hawking words: …..”This has meant that we have entered a new phase of evolution. At first, evolution proceeded by natural selection, from random mutations. This Darwinian phase, lasted about three and a half billion years, and produced us, beings who developed language, to exchange information. But in the last ten thousand years or so, we have been in what might be called, an external transmission phase. In this, the internal record of information, handed down to succeeding generations in DNA, has not changed significantly. But the external record, in books, and other long lasting forms of storage, has grown enormously. Some people would use the term, evolution, only for the internally transmitted genetic material, and would object to it being applied to information handed down externally. But I think that is too narrow a view. We are more than just our genes. We may be no stronger, or inherently more intelligent, than our cave man ancestors. But what distinguishes us from them, is the knowledge that we have accumulated over the last ten thousand years, and particularly, over the last three hundred. I think it is legitimate to take a broader view, and include externally transmitted information, as well as DNA, in the evolution of the human race”…… Bibliography 1. Life in the universe, by Stephen Hawking suggest the following evolution scheme: Energy => elementary particles => pre RNA “accidents” => RNA => DNA => seeds of life => language => written language => “External” Evolution; 2. The Anthropic Principle, from Wikipedia: ….”The anthropic principle (from Greek anthropos, meaning "human") is the philosophical consideration that observations of the universe must be compatible with the conscious and sapient life that observes it. Some proponents of the anthropic principle reason that it explains why the universe has the age and the fundamental physical constants necessary to accommodate conscious life. As a result, they believe it is unremarkable that the universe's fundamental constants happen to fall within the narrow range thought to be compatible with life”…… 3.CHON and CHNOPS: CHON is a mnemonic acronym for the four most common elements in living organisms: carbon, hydrogen, oxygen, and nitrogen. The acronym CHNOPS, which stands for carbon, hydrogen, nitrogen, oxygen, phosphorus, sulfur, represents the six most important chemical elements whose covalent combinations make up most biological molecules on Earth. Sulfur is used in the amino acids cysteine and methionine. Phosphorus is an essential element in the formation of phospholipids, a class of lipids that are a major component of all cell membranes, as they can form lipid bilayers, which keep ions, proteins, and other molecules where they are needed for cell function, and prevent them from diffusing into areas where they should not be. Phosphate groups are also an essential component of the backbone of nucleic acids and are required to form ATP – the main molecule used as energy powering the cell in all living creatures. Carbonaceous asteroids are rich in CHON elements. These asteroids are the most common type, and frequently collide with Earth as meteorites. Such collisions were especially common early in Earth's history, and these impacts may have been crucial in the formation of the planet's oceans. => Back
  • 9.
    Darwin Methodology -To “see” the Web more and better Building an avatar seed As seen by a Zen master - AI builder By Juan Chamero, from Spain as of February 27th 2016 Galileo Galilei looking the sky, Wikipedia and many others sources “To see more and better” avatar Darwin Ontology states that the Web is a big Cyber Ocean that hosts Cyber Creatures named “avatars” that register the Avatars of the Human Being, that is to say our vicissitudes, what happen to us, what we think about everything and anything. The registration units are Home Pages so within each of them could be avatars and/or pieces of avatars. This vision enable us to see the whole Web like a dual scenario where we humans live continuously “emitting” messages – aware or unaware of it - that are continuously registered by a sort of multimedia Cyber Ocean. This document intents to describe how to devise an avatar seed about Darwin Methodology based on a sample of well known quotations and inspirations related to human visions of the ALL namely: Galileo Galilei, Claude Shannon, Alan Turing, Roger Penrose, Albert Einstein, Stephen Hawking, Plato, René Descartes, John Von Neumann, Nikola Tesla, Jaime Balmes, the Tao Te Ching book, Zen writings about Bodhidharma, the Bible: Genesis, Apocalypses book and why not something about Pope Francis, Saint Augustine d’Hippo and from Teilhard de Chardin, an atypical intellectual cocktail isn’t? Note: Umberto Eco was recently added to our inspirers list as a posthumous homage. We recommend to read its book “How to write a Doctoral Thesis” similar to our Darwin avatar unveiling process performed manually.
  • 10.
    Galileo Galilei Eppur simuove!, and yet it moves! In 1633 after being forced to recant his claims that the Earth moves around the Sun Galileo Galilei works inspired Darwin Ontology “to see more and better the Web” by inventing the telescope “to see more and better the Sky”. Claude Shannon I just wondered how things were put together Information is the resolution of uncertainty Two Claude Shannon quotes from brainyquote.com. We human are in debt after its apparently simple, astonishing and disrupting Theory of Information. We wrongly claim that we are in the Era of Knowledge however we have still to make our homework to go a little ahead of Shannon within the Information Era. Darwin makes its own homework along that line. Alan Turing Science is a differential equation; Religion is a boundary condition Alan Turing could be considered the father of the Computing Science “avatar” in full as of today, a real genius well endowed in almost everything and also pioneer of the thinking machines utopia. He suggested that machines may think a crucial and long lasting controversial subject: A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.
  • 11.
    Roger Penrose There aretwo other words I do not understand — awareness and intelligence. Roger Penrose argues that the present computer is unable to have intelligence because it is an algorithmically deterministic system against the viewpoint that the rational processes of the mind are completely algorithmic and can thus be duplicated by a sufficiently complex computer. See controversial with Marvin Minsky, that say exactly the opposite: that humans are, in fact, machines, whose functioning, although complex, is fully explainable by current physics, See also GoogleTechTalks. Albert Einstein Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning. Albert Einstein: what to meaningfully add to our avatar about science, knowledge, wisdom and consciousness? We only dare to select some of its quotes: o It has become appallingly obvious that our technology has exceeded our humanity. o The true sign of intelligence is not knowledge but imagination. o Logic will get you from A to B. Imagination will take you everywhere. o Science without religion is lame, religion without science is blind. Coexistence of dualities: Wave–particle duality is the fact that every elementary particle or quantic entity exhibits the properties of not only particles, but also waves. It addresses the inability of the classical concepts "particle" or "wave" to fully describe the behavior of quantum- scale objects: As Einstein wrote: "It seems as though we must use sometimes the one theory and sometimes the other, while at times we may use either. We are faced with a new kind of difficulty. We have two contradictory pictures of reality; separately neither of them fully explains the phenomena of light, but together they do".
  • 12.
    Stephen Hawking We areall now connected by the Internet, like neurons in a giant brain. The Web Ocean hosting all Human avatars: In Hawking words: …..”This has meant that we have entered a new phase of evolution. At first, evolution proceeded by natural selection, from random mutations. This Darwinian phase, lasted about three and a half billion years, and produced us, beings who developed language, to exchange information. But in the last ten thousand years or so, we have been in what might be called, an external transmission phase. In this, the internal record of information, handed down to succeeding generations in DNA, has not changed significantly. But the external record, in books, and other long lasting forms of storage, has grown enormously. Some people would use the term, evolution, only for the internally transmitted genetic material, and would object to it being applied to information handed down externally. But I think that is too narrow a view. We are more than just our genes. We may be no stronger, or inherently more intelligent, than our cave man ancestors. But what distinguishes us from them, is the knowledge that we have accumulated over the last ten thousand years, and particularly, over the last three hundred. I think it is legitimate to take a broader view, and include externally transmitted information, as well as DNA, in the evolution of the human race”…… Plato Wise men speak because they have something to say; Fools because they have to say something This quote from Plato is a brief and antique example of semantic subtleness: two extreme “in mind images” (wise – fool) expressed in a given language (English in this case) as a misleading similarity. The theory of Forms (or theory of Ideas) typically refers to the belief that the material world as it seems to us is not the real world, but only an "image" or "copy" of the real world. In some of Plato's dialogues, this is expressed by Socrates, who spoke of forms in formulating a solution to the problem of universals. The forms, according to Socrates, are archetypes or abstract representations of the many types of things, and properties we feel and see around us, that can only be perceived by reason (Greek: λογική).
  • 13.
    René Descartes Cogito ergosum; Je pense, donc je suis; I think, therefore I am; Pienso luego existo Descartes may be considered the father of the modern western philosophy and for many also of the 17th-century continental rationalism, later advocated by Baruch Spinoza and Gottfried Leibniz. See its Discourse on the Method and its four rules:  "The first was never to accept anything for true which I did not clearly know to be such; that is to say, carefully to avoid precipitancy and prejudice, and to comprise nothing more in my judgment than what was presented to my mind so clearly and distinctly as to exclude all ground of doubt.  The second, to divide each of the difficulties under examination into as many parts as possible, and as might be necessary for its adequate solution.  The third, to conduct my thoughts in such order that, by commencing with objects the simplest and easiest to know, I might ascend by little and little, and, as it were, step by step, to the knowledge of the more complex; assigning in thought a certain order even to those objects which in their own nature do not stand in a relation of antecedence and sequence.  And the last, in every case to make enumerations so complete, and reviews so general that I might be assured that nothing was omitted." John von Neumann With four parameters I can fit an elephant, and with five I can make him wiggle his trunk There probably is a God. Many things are easier to explain if there is than if there isn't. John von Neumann was the missing piece of the Cyber Era: a genius and a “doer” of the everything! The above quotes speak by themselves. About the hidden sides of many scientific milestones: John von Neumann, for many the father of Modern Computing suggesting to Claude Shannon a name for his new uncertainty function: You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.
  • 14.
    Nikola Tesla Every livingbeing is an engine geared to the wheelwork of the universe. Though seemingly affected only by its immediate surroundings, the sphere of external influence extends to infinite distance. Nikolas Tesla, perhaps the best modern avatar of the “inventor” and of the inventive was a Serbian American electrical engineer, mechanical engineer, physicist, and futurist best known for his contributions to the design of the modern alternating current electricity supply system. See some quotes from its autobiography:  Instinct is something which transcends knowledge. We have, undoubtedly, certain finer fibers that enable us to perceive truths when logical deduction, or any other willful effort of the brain, is futile.  do not think there is any thrill that can go through the human heart like that felt by the inventor as he sees some creation of the brain unfolding to success... such emotions make a man forget food, sleep, friends, love, everything.  It seems that I have always been ahead of my time. I had to wait nineteen years before Niagara was harnessed by my system, fifteen years before the basic inventions for wireless which I gave to the world in 1893 were applied universally. Jaime Balmes Entendemos más por intuición que por discurso: la intuición clara y viva es el carácter del genio Father Jaime Balmes y Urpiá (Catalan: Jaume Llucià Antoni Balmes i Urpià; 28 August 1810 – 9 July 1848) was a Spanish Catholic priest known for his political and philosophical writing. In some extents he could be considered a “Common Sense Philosopher”.  La lectura es como el alimento; el provecho no está en proporción de lo que se come, sino de los que se digiere.  Me convencí de que dudar de todo es carecer de lo más preciso de la razón humana, que es el sentido común.  Terrible es el error cuando usurpa el nombre de la ciencia. Balmes distinguishes between the concept of truth and the concept of certainty. Truth is the expression of the agreement of the ideal order with the thing. Certainty is the mental acceptance of the truth. There are two kinds of certainty: general human certainty (acquired spontaneously and instinctively), and philosophical certainty (the fruit of intellectual reflection).
  • 15.
    Bodhidharma (Zen) As longas you look for a Buddha somewhere else, you'll never see that your own mind is the Buddha.  If you use your mind to look for a Buddha, you won't see the Buddha.  The mind is the root from which all things grow if you can understand the mind, everything else is included. Zen, quantum mechanics, Yin – Yang, Tao Te Ching, mind, awareness, consciousness, ontologies, Tai Chi, Kung Fu and more….: Bodhidharma, the Zen creator, was a Buddhist monk who lived during the 5th or 6th century. He is traditionally credited as the transmitter of Chan (Zen) Buddhism to China, and regarded as its first Chinese patriarch. According to Chinese legend, he also began the physical training of the monks of Shaolin Monastery that led to the creation of Shaolin Kung Fu. Darwin Ontology has something of Zen that states that the underlying base of reality is change, process and impermanence relatively in slow motion and that the observer is part of the system……The strange interactions of fundamental particles with the mind of the observer ('quantum weirdness') have long been of interest to philosophers. There are two opposing views: (i) Quantum weirdness produces the mind, versus (ii) The mind produces quantum weirdness. See log about Buddhism, Quantum Physics and Mind. Genesis – The Tower of Babel The word is the Verb and the verb is God (Victor Hugo) A Semantic Enigma: Is it an enigma or a warning light? Why so many and so different languages? Is really the Verb the creator of the Everything? The Tower of Babel (/ˈbæbəl/ or /ˈbeɪbəl/; Hebrew: ‫ל‬ ַּ‫ד‬ְ‫ג‬ ִ‫מ‬ ‫ל‬ ֶ‫ב‬ ָּ‫ב‬, Migdal Bāḇēl) is an etiological myth in the Book of Genesis of the Tanakh (also referred to as the Hebrew Bible or the Old Testament) meant to explain the origin of different languages. According to the story, a united humanity of the generations following the Great Flood, speaking a single language and migrating from the east, came to the land of Shinar (Hebrew: ‫שנער‬). There they agreed to build a city and tower; seeing this, God confounded their speech so that they could no longer understand each other and scattered them around the world.
  • 16.
    Apocalypses (Revelation) We arejust an advanced breed of monkeys on a minor planet of a very average star. But we can understand the Universe. That makes us something very special (Stephen Hawking) Revelation 19:11-21 And I saw heaven opened, and behold, a white horse, and He who sat on it is called Faithful and True, and in righteousness He judges and wages war. His eyes are a flame of fire, and on His head are many diadems; and He has a name written on Him which no one knows except Himself. He is clothed with a robe dipped in blood, and His name is called The Word of God. Pope Francis Oh, how I would like a poor Church, and for the poor. A leading exponent of the word sacralization and vulgarization at the same time  Oh, how I would like a poor Church, and for the poor.  We must restore hope to young people, help the old, be open to the future, and spread love. Be poor among the poor. We need to include the excluded and preach peace.  I am always wary of decisions made hastily. I am always wary of the first decision, that is, the first thing that comes to my mind if I have to make a decision. This is usually the wrong thing. I have to wait and assess, looking deep into myself, taking the necessary time.  Sometimes negative news does come out, but it is often exaggerated and manipulated to spread scandal. Journalists sometimes risk becoming ill from coprophilia and thus fomenting coprophagia: which is a sin that taints all men and women, that is, the tendency to focus on the negative rather than the positive aspects. “The internet …,” writes Pope Francis today, “offers immense possibilities for encounter and solidarity. This is something truly good, a gift from God.” See Communication at the Service of an Authentic Culture of Encounter": Pope's Message for World Communications Day.
  • 17.
    Saint Augustine The worldis a book, and those who do not travel read only a page Men go abroad to wonder at the heights of mountains, at the huge waves of the sea, at the long courses of the rivers, at the vast compass of the ocean, at the circular motions of the stars, and they pass by themselves without wondering. Augustine of Hippo (/ɔːˈɡʌstᵻn/ or /ˈɔːɡəstɪn/; Latin: Aurelius Augustinus Hipponensis; 13 November 354 – 28 August 430), also known as Saint Augustine, Saint Austin, or Blessed Augustine, was an early Christian theologian and philosopher whose writings influenced the development of Western Christianity and Western philosophy. Sayings about Quantum Physics, time and Saint Augustine: In the Confessions of St. Augustine, Book IX, Chapter X (chapter 9, section 10) there is a philosophical analysis of time. Though Bertrand Russell was an atheist and says that he has a different philosophy of time than Augustine, in his History of Philosophy, Russell nevertheless less says that Augustine's philosophy of time is deeply profound. Among other conclusions, Augustine states that both the past and the future exist simultaneously, and yet only the now exists. And that in God there is no time. Teilhard de Chardin The universe as we know it is a joint product of the observer and the observed. Relevant concepts concerning Darwin Ontology: noosphere, prolegomena about The All, and Omega Point. Pierre Teilhard de Chardin SJ (French: [pjɛʁ tejaʁ də ʃaʁdɛ̃]; 1 May 1881 – 10 April 1955) was a French philosopher and Jesuit priest who trained as a paleontologist and geologist and took part in the discovery of Peking Man. He conceived the idea of the Omega Point (a maximum level of complexity and consciousness towards which he believed the universe was evolving) and developed Vladimir Vernadsky's concept of noosphere.
  • 18.
    Darwin Ontology (I) Inany piece of the ALL you may see the ALL This e-book depicts a long Semantic Web scouting along 11 years from two points of view: from Digital to Mind along W3C standards and from Mind to Digital for many the "Common Sense Way". The outcome of this journey is Darwin, a semantic ontology to "see the Web as semantically structured" through a sort of "Semantic Eyeglasses". These virtual eyeglasses like the Galileo Galilei telescope enable us to Map the whole Web as_is and to build Semantic Super Search Engines that work at mode YGWYN in only one query. Darwin Ontology (II) Everything connected with everything The Web is a layer on top of Internet that for many belongs to the people. In my humble opinion this was not planned, but an accident, the consequence of the appearance of a revolutionary technology as it happens along the evolution. Before Internet arrival communications media, newspapers, Radio and TV worked unidirectional, from a de facto “Established Order” side to the “People’s” side “broadcasting” programmed pieces of information and knowledge, from sellers to buyers, from rulers to ruled, from teachers to students, from truth holders to truth seekers. The Peoples’ side is explored via Darwin, an AI Ontology that enable us to see the Web more and better focusing in Social Networks and the Deep Web, for many the hidden Web. As a demo a Darwin agent makes over Established side a “tomography” for the theme art history, from Altamira Caves to Nanoart.
  • 19.
    Umberto Eco Umberto Ecoquotes by relatably.com The last but not the least! I included Umberto Eco – the genial Italian author and semiologist recently deceased - as representative of one out of many Darwin Ontology hidden influencers. Read “come si fa una tesi di laurea”, how to write a doctoral thesis. o But now I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. o Captain Cook discovered Australia looking for the Terra Incognita. Christopher Columbus thought he was finding India but discovered America. History is full of events that happened because of an imaginary tale. Conclusions We need a semantic ontology to see the Web more and better: what are the four crucial questions we must ask ourselves?  What do we have at hand? o Data, Information, Knowledge, Wisdom o Intelligence  What should we have to unveil? o The ALL o The Everything Connectedness o The Man Machine utopias  What should we have to consider in the ontology? o To unveil all type of Avatars; o To unveil Semantic Logical Trees structures; o To unveil all type of Directed Graphs; o To unveil and manage K versus K’ namely: Websites versus Users dialogues; o To unveil K Thesaurus, namely: formal Knowledge Thesaurus; o To unveil K’ Thesaurus, namely: Users Knowledge Thesauruses; o To take into account and continuously check that HK is bounded; o To take into account and continuously check that HK’ is bounded; o To take into account and continuously check the semantic weight of WWDs, Well Written Documents; o To take into account and continuously check the Semantic Resonance of accomplishment unveiled concepts; o To take into account and continuously check the In mind images uniqueness for all languages; o To take into account and continuously check the Concepts Uniqueness for all languages; o To take into account and continuously check the Concepts Specificity accomplishment and uniqueness for all languages; o To unveil the best sets of Authorities for any subject; o To check continuously that the ontology is fully accomplished by the Semantic seeds; o To check continuously that Semantic fingerprints are appropriately computed for any subject; o To check continuously that Semantic metadata are appropriately computed for any subject; o To check continuously all type of intrusions warning and estimating their pollution effects;
  • 20.
    World Authorities InfluenceLogic Matrix Once defined – as a fact - our major Darwin Authorities Influencers we proceed to briefly depict either the semiotic – semantic aspect or facet influence of each of them over our ontology. This matrix behaves as a philosophical and scientific support about a complex problem: How to see the Web more and better. For instance one of the Darwin Conjectures states that the Web space is structured as a dual system and continuous interacting worlds: K Side or Established Knowledge (Websites owners, authors and administrators) versus K’ Side or People Side (we, humans as Internet users): Albert Einstein, Stephen Hawking, and Bodhidharma are notorious influencers. Another example concerns what we defined as “Semantic Resonance”: Our Darwin Ontology also states that for a given pair “language – culture” we may unveil millions of “in mind images”. These images are recognized by their “names”. It also states that they could be unveiled by the phenomenon of “Semantic Resonance”: any in mind image could be retrieved via Conventional Search Engines, by many different “names”, however only one of these names will be the semantic winner in terms of quantity and quality of references.  Galileo: the meaning of “To see more and better” and tools to perform it;  Shannon: The actual Web is not yet semantic; intuitive ideas trying to understand “knowledge” meaning by enriching and extending the meaning of ”information”;  Turing: limits of Web unveiling by huge and exhaustive procedures – Big Data -;  Penrose: Human Intelligence is more than algorithmically deterministic system;  Einstein: coexistence of dualities; intelligence is much closer to imagination that to knowledge;  Hawking: cyber avatars; coexistence of dualities; externally transmitted information as important as internally transmitted via DNA and genes to evolve;  Plato: in mind image, forms and avatars; the essence of concepts as semantically specific universals;  Descartes: The Discourse of the Method in its full and ample sense;  Von Neumann: entropy, negentropy, and Big Data within ontology concerns;  Tesla: inventive avatars axes; ideas about Darwin Intelligence Reports buildup;  Balmes: Logic versus Common Sense concerning trees, arboreal logic, semantic resonance…);  Bodhidharma: quantum physics and mind; coexistence of dualities; avatars; awareness;  Genesis: The Power of Semantics; The Power of Words;  Apocalypses: The ALL, The EVERYTHING and the END; black holes;  The Pope: as a typical Darwin avatar archetype:  Saint Augustine: only the NOW exists; time inexistence utopias;  Teilhard: The Noosphere;  Darwin: The Semantic Ocean  Semantic Web  Noosphere;  Eco: ideas about how to build trustable IR’s; => Back
  • 21.
    Darwin Ontology -Conjectures By Juan Chamero, from Spain at March 30th 2016 Conjectures Subjects Overview Darwin Ontology enables humans to “see more and better” the Web throughout the ontological and computational guide of its Conjectures. This vision involves seeing the Web as totally indexed by meanings, approaching as much as possible to the Semantic Web utopia and the detection and retrieval of all type of data, information and knowledge disperse on it. Conjectures subjects follow: 1 (a) A world of human “in mind images”; 2 (b) A world of “words”; 3 (c) A quantifiable and bounded World; 4 (d) A world of probabilistic nature; 5 (e) A nominative world: all its creatures have a name; 6 (f) A world of semantic vibrations; 7 (g) A world of avatars; 8 (h) Avatars library; 9 (i) Retrieval of hidden intelligence; 10 (j) Knowledge DIKW; 11 (k) A world of Arboreal Structures; 12 (l) Avatars unveiling and IR’s, IdeI’s; 13 (m) K versus K’ worlds; 14 (n) K versus K’ Semantic interchange; 15 (o) Two types of Semantic Particles; 16 (p) e-membranes and unveiling without perturbing; 17 (q) Disciplines of Knowledge, WWD’s and WFF’s; 18 (r) Subjects and Concepts; 19 (s) Knowledge Authorities; 20 (t) Human Knowledge, Thesauruses and Derived Concepts; 21(u) Thesauruses for K and K’; 22(v) Semantic Web; Avatars Conjectures: Internet and particularly the Web enable humans to keep a huge, open and in extreme detail virtual log book of our lives, of our occurrences, activities and even of our thoughts and “in mind” processes along time as well. The entries of this log could be assimilated to “avatars” in their different acceptations namely from graphic representations of all type of things and entities including personalities and investitures to incarnation of deities or facets of them and of ideal creatures. Avatars could also be imagined like “meaningful in mind images” that need to be “explained” to be understood. Conjectures in emerald deal with avatars. Semantic Web Conjectures: We may also see the Web as a huge multimedia reservoir structured as a dual and continuous interacting world: one we name as “K side” assigned to formal creatures registering the “Established Knowledge” at a given moment and the other we name as K’ side assigned to the people as users and at the same time proprietors of the “Knowledge in Formation”. Conjectures in blue deal with Semantic Web. Hinge Conjectures: In order to operate such a huge and complex semantic system we need of a few “hinge” Conjectures in grey connecting the old Darwin Conjectures (in blue) to the new last 10 conjectures (in emerald) that enable us to “see” the Web as an Ocean of Avatars, a more advanced vision than the Semantic Web.
  • 22.
    The First 12Conjectures Synopsis The Web could be seen as a world of human in mind images; humans agree about their meaning thru specific and appropriate use of words; this universe could be quantified; however probabilistically; it also could be imagined like a huge Cyber Ocean where these nominative creatures, bearing personal names, live; It also could be imagined like a huge Ocean where these nominative creatures behave like wavelets enabling their detection and recognition by semantic resonance via search engines; in fact a world of avatars as virtual creatures registering “literarily” the avatars of our past, present and probable futures lives; these avatars are documented and hosted like in conventional libraries but by pieces of information and knowledge disperse by billions here and there; the patterns of this dispersion suggest the existence of a hidden intelligence that could be unveiled; up to here we have presented a set of 10 Conjectures necessary to operate with avatars. However conventional informatics works by de facto under a sort of Cyber pre agreement: the DIKW Pyramid; from here we may state that formal knowledge tend to structure by itself as a wood of “Semantic Trees” and that knowledge in formation tend to structure also by itself as more primitive and disordered arboreal forms; knowledge in formation, informal forms of knowledge and complex forms of information are the basic components of Intelligence Reports managed as avatars; Darwin Ontology Conjectures a) A world of human “in mind images”: Human beings transmit their cognitive legacy thru “in mind images” as “concepts” only “seen” thru our minds; b) A world of “words”: In mind images identify specific pairs “language- culture” for instance “American - English” and “International – Spanish” meaning that they can be explained and understood thru the pair language – culture of their belonging; c) World sizes: The total number of these in mind images, as Web space creatures, is estimated at present from 12 to 20 million per pair language – culture; d) A world of probabilistic nature: For each pair language – culture in mind images are “unique” however with a “unicity” spectrum of probabilistic nature that is to say that all them as well as their corresponding explanations may differ slightly from person to person and even from situation to situation and from moment to moment; e) The names of its creatures: For each pair language – culture in mind images are identified by their “unique” names expressed as precise chains of words namely: “running”, “meditate”, “son of a single mother”. “EU young people unemployment rate”, “Pope Francis”, “Barack Obama”, etc.; f) A world of semantic vibrations: Names are unique in probabilistic terms, “probabilistically talking” for a determined place and time, for example the Web as_is at a given moment associated to a sort of “Semantic Resonance”; g) A world of avatars: These in mind images we nominate as “avatars” could be mentally seen, perceived and/or represented thru text, visual images, sounds, and multimedia of
  • 23.
    any type areCyberspace “virtual creatures” defining our civilization at ANYTIME and ANYWHERE; h) Avatars library: Our civilization along time, have properly agreed and recorded besides those agreements throughout books, essays, comments and very recently in semantic documents as Web pages; i) Retrieval of hidden intelligence: Avatars are either seen or looked like structured following similar patterns to our way of thinking for example more and less important more and less complex however always hierarchically and by affinity interrelated as pertaining to a unique “The All”; j) Knowledge DIKW: Hypothetical Conjecture: DIKW Pyramid. This parallel life paradox of we, humans, and of our avatars looked like embedded within a common sense model agreed and evolving along time represented by the hierarchical pyramid Data => Information => Knowledge => Wisdom refined and structured thru a growing intelligence; k) Arboreal structures: Knowledge defined as the hierarchically triad Facts (Data) => Information => Skills and Talents, and as our evolutionary guide as well would structure by itself as arboreal forms. Ideally as a wood of “Semantic Trees” of unique roots and thematic ancestry; l) Avatars unveiling and IR’s, IdeI’s: Avatars are “seen” by our minds with a diversity of forms and ways proportional to their complexity and to the cultural differences of the observers, humans, groups, and/or collectives (see Conjecture d)). This feature enables us to unveil objectively and non-vitiated IR’s, “Intelligence Reports”, semantically depicting as many facets as existent in the avatar at a given moment, something like for instance Vision 1 of the “ism - 1”, Vision 2 of the “ism - 2”,…., Vision n of the “ism - n”, etc.; m) K versus K’ worlds: General man-machine interaction could be imagined as a continuous dialog and dynamic equilibrium between two sides: the Established Knowledge - Realm K versus the People’s Knowledge – Realm K’; n) K versus K’ Semantic interchange: Through the subtle interface between K and K’, relatively to each side inflow and outflow only two kinds of semantic particles: “Established Concepts” (from K to K’) and “People’s Concepts” (from K’ to K). These particles are “separated” by communications/instances, operators from K and K’ respectively necessaries to make dialog meaningful; o) Two types of Semantic Particles: Documents and messages, the elementary objects of Realms K and K’ are only constituted by two kinds of semantic particles: “Common Words and Expressions” and “Concepts”; p) e-membranes and unveiling without perturbing: This digital dialog may also be imagined like performed trough “e-membranes”, resembling bio membranes with endoderm, mesoderm and ectoderm where inflow and outflow traffic of semantic particles and instances could be “seen” without perturbing K and K` Realms actors. Darwin took its name of this Conjecture: Distributed Agents to Retrieve the Web INtelligence as a Darwin network of e-membranes; q) Disciplines of Knowledge, WWD’s and WFF’s: Documents in K side tend to discriminate in “disciplines” of the Established Human Knowledge. For each discipline there exist a
  • 24.
    minority of documentsthat fit “as much as possible” to their “trees” being at the same time literary and conceptual “Well Written” and a majority of document that doesn’t. The first ones are considered “authorities”. WWD, Well Written Documents resemble WFF’s, Well Formed Formulae of Formal Logic; r) Subjects and Concepts: Subjects are those specific concepts associated to the nodes of their respective discipline trees as the “semantic paths” that arrive to them from their roots. Concepts “should be” the same for all pairs language - culture. For each node there exists one and only one subject. Being the subjects known appear for each of them new and somehow derived concepts that “belonging” with a strong specificity to it could be defined as its “Associated Concepts”, namely those ones that “at large” define and precise their respective themes; s) Knowledge Authorities: For each subject there exist at a given moment with a high level of probability, within universal and huge reservoirs like the Web, a “Set of Authorities” dealing with it with a well defined authoritativeness; t) Human Knowledge, Thesauruses and Derived Concepts: From “Sets of Authorities” we may develop a sort of industrial process to extract their “Associated Concepts” sets establishing then the following correspondence: for each subject we may find its representative authorities set and from it we may build its Associated Concepts set. All discipline trees of the “Human Knowledge” that within their nodes have their respective authorities’ sets and their respective Associated Concepts sets constitute the “Web Thesaurus”; u) Thesauruses for K and K’: A similar Thesaurus could be defined and unveiled in the K’ Realm as the “People’s Thesaurus”. Similarly to Subjects K, Authorities K, Associated Concepts K could be defined Subjects K’, Authorities K’ and Associated Concepts K’; v) Semantic Web: Once K and K’ sides are known as_they_are, unveiled from retrievable Web documents and messages, actors on each side are enabled to know as much as possible of the other side. This event will accelerate the human learning process. K and K’ could be considered as fully mapped and this mapping may be continuous and perfected along time. => Back
  • 25.
    DARWIN in anutshell Darwin Methodology Briefing By Juan Chamero, Principal Architect, from Barcelona as of 2015-01-13 Upon INTAG proprietary document: darwin_brief_PDF.rar Darwin Brief (darwin_brief.pdf): a brief Darwin index to accede to:  WHAT is Darwin (darwin_methodology.pdf): it is a methodology to “see more and better” the Web and comparable data reservoirs thru 11 applications;  HOW to “see” the Web from Darwin (Darwin_Web_EN.pdf): for experts;  BIBLIOGRAPHY (Darwin_bibliography.pdf): about Darwin and its creators; Conclusions: ¿Are we in the dawn of a new way of thinking? And in its turn Darwin Methodology (darwin_methodology.pdf) opens in:  A CARROUSEL (darwin_carrousel.pdf): imagery about a possible way to explore Darwin meaning as a manifestation of a new way of thinking and a new vision of existent world under the new technologies;  A KNOWHOW sample (darwin_basicbuildup.pdf): something about HOW Darwin “see” meaningful connected what appear in the Web disperse here and there and not structured;  Even though not conscientious of it we humans are by de facto and of a sudden submerged in BIG DATA (darwin_BigData.pdf) scenarios that each time leave us less time to think: ancestrally we passed from a “many” of hundreds and thousands of events and instances to trillions and more and shrinking our meditations times from months, days and hours to fractions of a second as well. Darwin moves with suitability in these scenarios.  Complementary a brief explanation of the 11 Darwin Applications and of their respective DARWIN ICONS (darwin_icons.pdf). Icons and avatars are a common place of our cyber culture that has incorporated besides text and image the audible, the visible and shortly the tactile. => Back
  • 26.
    Darwin Brief Juan Chamero,from Buenos Aires, Argentina as of January 1st 2015 Index  Darwin Methodology (2 pages) o Darwin Carousel (5 pages) o Darwin Maps Buildup (3 pages) o Darwin Big Data (3 pages) o Darwin Icons Meanings (5)  The Web as seen by Darwin Methodology (3 pages)  Darwin Bibliography (1 page) Recommended lecture order: if you are well acquainted with Semantics you may start reading The Web as seen by Darwin Methodology. If you are not the order suggested is to read Darwin Methodology, an introductory document that explains how to see the Web as semantic. This document is complemented by three appendixes: Darwin Carousel, a document that tries to explain that perhaps we are in the dawn of a new way of thinking; Darwin Maps Buildup, a document that explains the basic knowhow to unveil non structured information and knowledge out from the Web; and Darwin Big Data, a document explaining how Big Data have challenged us and perhaps is somehow changing our way of thinking computationally. We also recommend reading Darwin Icons Meanings document that briefly describes each one of its actual 11 applications and finally Darwin Bibliography, an index of links. => Back
  • 27.
    Darwin Methodology Distributed Agentsto Retrieve the Web Intelligence By Juan Chamero Principal Architect of Darwin Methodology, as of January 6th 2015 We present Darwin, a methodology to “see more and better” huge reservoirs of data like the Web as if their contents were semantically structured. It implies detection, retrieval, ordering and synthesis of all pieces of information and knowledge about any subject disperse here and there within the reservoirs. The synthesis may take the form of: (see below); see Darwin icons meanings; o Thesauruses; o Maps of Knowledge; o Non intrusive e-membranes to communicate among different environments; o OOC, Only One Click Semantic Search Engines; o Encyclopedias; o Non intrusive massive Surveys and Polls about any subject; o Intelligence Reports about any subject; o Avatars, AI creatures that represent and/or emulate primary powers and trends; o Intelligent Web Portals; o Big Data synthesis; o Autonomous Artificial Communities; Are we around the end of Conventional Thinking? Behind Web Semantic: a global cultural discontinuity? Apology from Legal Dictionary This is an apology and a sample of recent digital history: At the end of our long journey that lasted 15 years trying to unveil the Web we failed to document an enough comprehensive synthesis of the work performed and its findings. Why? As a Zen master however specialized in Artificial Intelligence I continued with my Western habit writing books and essays via “papers”, white and classified ones, trying to explain others the whole history as a “linear” logic sequence, from prologues to epilogues thru abstracts, antecedents, of course the textual cores, appendixes and bibliographies. The investigated theme, Web Semantics was perhaps too big and vague and seen at distance it seems to be highly disruptive as well.
  • 28.
    The Web explodesregistering almost everything that happen in our world openly and free. Darwin enables us the building of tools and methodologies to see the Web more and better at extreme detail resembling a super ultra semantic telescope. However we have to pay a toll to use those tools and methodologies: to enter de facto into Big Data scenarios embedded “of a sudden” in a world that enforce us to behave “real time” without enough time to “think”. Now the Zen non linear way of thinking comes in my help. Zen exploration?: Are we going to a “To be aware of everything” way of thinking?: One of the trivial things we “discovered” by exploring the Web as our main source of information and knowledge is that ALL is related to ALL and EVERYBODY to EVERYBODY being those everybody physical and juridical creatures and avatars, where everything that occurs to them has sense. It’s is like making a census by interviewing people as persons searching for how they consider something bad or good, we like or we do not like. From the beginning, by dialoguing, questioning and why not answering, we may easily detect factors that were initially ignored and that in order to be honest and precise should be taken into account. We continue thinking how to face this new challenge where nothing is really environment but interactive part of a unique and always changing ALL, but in the interim reporting our awareness state along our Web explorations. And reporting not only the rare things we may found but also our agents’ findings under our guidance. The bonus of being aware example: As an example Darwin process raw cognitive units named “textons”, huge vectors of 10,000 documents and more per theme and talking about a cognitive universe of 1,000,000 themes we should – in theory- process 10,000,000,000 documents!. This is an a priori discouraging scenario, isn’t? However the first textons exploration presented to the “eyes” of our agents rareness signs that enable us to find two and three order zero shortcuts. To understand Darwin better we invite you to follow its logbook. Darwin Logbook Darwin Carousel – A Technology Paradox Why do we do what we did? Carousel from Google images Basic building knowhow Web Thesauruses and trustable Intelligence Reports buildup A Big Data Challenge How to retrieve all concepts of a given language culture Ways to make sense of Big Data, from Phys.org => Back
  • 29.
    Darwin HKM Carousel Atechnology paradox Juan Chamero, as of January 5th 2015 Encyclopedias: Paradoxically wiithin the Digital Revolution, within The Information Society and very recently within the Social Network conventional Encyclopedias are dying! Until recently the last Encyclopedias like The Britannica announced its last 2010 printed edition version dealing with approximately 80,000 subjects, by the way 10% of the available knowledge disperse in the Web. The last systemic index of the Human Knowledge before these “conventional” was the Diderot Encyclopedia (1751 – 1772) edited in French. The Diderot Encyclopedia Wow! Knowledge is within our minds! And not only knowledge but all type of information and even wisdom! Being deep in our minds we only acquaint of these substances indirectly thru registers, documents, works and gestures! So if actually the Web hosts 40,000,000,000 documents we may say that we may “have at hand” 40,000,000,000 documented expressions of those ideas! The first time we humans have “at hand” a global and meaningful sample of all type of ideas from genialities to stupidities, online, and almost real time besides!
  • 30.
    The power ofgestures: if you query Google Images by “demand for explanation” it will render you hundreds of versions of this image. Do we need some extra explanation to understand this universal baby gesture? And we also have at hand billions of images about the Web_as_is “everything”! Demand for Explanation How many types of “in mind” ideas do we have? At least three types: pieces of information, pieces of knowledge, and needs. Pieces of information are a continuous need to guide our lives; pieces of knowledge are a crucial need to evolve positively along our lives; needs in general to live and for a living. We acquire knowledge by studying, information by questioning; general needs are acts that become “experience”. From dreamstime.com In order to study [1] you need “libraries”, “Thesauruses” and “Encyclopedias”; In order to have an efficient questioning [2] you need “semantic search engines”, and in order to optimize your experience [3] you need to collegiate with similar people. Darwin enables the Web to be used as a study home, become semantic conventional search engines and facilitate open and free organization of people with similar areas of interests.
  • 31.
    A little aboutsome necessary “bored” things: First HKM, Human Knowledge Map, about ICT (2002): we created the first Web Thesaurus about Information Computing and Telecommunications. Initially we started joining the last ACM, Association of Computing Machinery (2001) semantic index with the IFIP UNESCO Informatics standardization for the RW, Rest of the World: an ICT Thesaurus of 2,300 subjects and 54,000 concepts. We see below its upper 5 hierarchical levels. Are arboreal structures natural forms of our thinking? It seems that yes, they are. And going a little farther what about our abstract thinking? It also seems that we are also used to. In the figure below at left asking Google by “fractal tree” we get a sample image of primitive arboreal abstract trees. At right asking Google Images by phylogenetic and then by Life Trees we may get a full sample of arboreal trees (real bio trees). Abstract trees Bio trees Google Homage to Ramon & Cajal Nobel Price of Medicine, (1906), neuroscientist, perhaps the father of modern neuroscience, was considered a disastrous student with an extreme revolutionary antiauthoritarian attitude and even for his father a “little short” of brain. One of his metaphors was: cortical pyramidal cells may become more elaborate with time, as a tree grows and extends its branches. I believe that this
  • 32.
    metaphor suggests usthat we may go far walking with small steps but always expanding our mind awareness by exploring the unknown. Ramon & Cajal was, like the Great Leonardo, a brilliant draftsman: he began drawing more and more complex neural networks and learning from them at exponential pace. The beginning should be neat and easy for all: semantic seeds. As explained above Ramón & Cajal started his adventure “to see more and better” the brain trying to unveil and draft on paper a single neuron. We as Darwin Project were experimenting with different types of seeds and strategies for growing. Our option for The Art map building was the last downright schema.
  • 33.
    Of course wedo not imagined the seed as having 7,570 nodes as an “adult tree” but as a tiny skeleton of a root initially opening in up to seven clusters as seen in the image below namely: performing arts, visual arts, culinary arts, literature, arts history, physical arts, and arts infrastructure. The Art Thesaurus: the figure below depicts one of the visions of The Art Thesaurus resembling a “map” to facilitate the human comprehension and exploring the existing knowledge as is in the Web at a given (any) moment. It could also be seen as the Upper Levels of an Art Tree index. Another technological paradox: we have seen Ramon & Cajal paradox, the revolutionary and/or disruptive innovations as a sort of unsought premium bonus of long scientific exploration efforts. Another one is the Global Warming phenomenon ”unveiled” by Wallace Smith Broecker, geophysicist and climate authority as one “byproduct” of his research along decades about the CO2 concentration in our atmosphere. And from this discovery proliferate hundreds of derived and interrelated researches, for instance the creation of “Artificial Trees” (see below in thebreakthrough.org). Artificial Trees, from thebreakthrough.org => Back
  • 34.
    Darwin Methodology HKM, HumanKnowledge Map basic building knowhow Juan Chamero, as of January 5th 2015 Darwin HKM buildup could be logically imagined along a sequence of four mega steps, namely: a) MS’s data discovery; b) HKM Logical Skeleton buildup; c) HKM unstructured data sample; d) HKM structured. Step 1: By MS’s, Major Subjects data discovery we mean scouting the Web to retrieve all “modal names” of the knowledge branch under study, only their names not their conceptual meaning not their relative ordering within the Logical Tree of the knowledge branch. In our example we mean discovering the 7,570 names of The Art MS’s, Major Subjects. Step 2: By HKM Logical Skeleton buildup we mean to unveil from the Web the semantic ordering of modal names becoming “nodes names” of the Knowledge Tree finding the unique correspondence between a tree node and its modal name: 7,570 unique nodes  7,570 unique modal names for our example. Step 3: By HKM unstructured data sample we mean a huge conceptual but still unstructured data sample of the branch of knowledge under study. This Big Data scenario is not easy to imagine: For each node we need a meaningful sample (for instance 10,000 Web Pages) that enables us to discriminate how structured, noisy, disperse, misleading and diluted from its semantic point of view a given MS is. Step 4: By HKM structured we mean a synthesis of the above mentioned data sample: for each node we should synthesize its “semantic fingerprint”, a set or sets of specific concepts that statistically and ideally are used in the Web_as_is at a given moment and for a given pair language culture to describe the subject inspected. Step 3 Performing: Let’s suppose that somehow we have already successfully performed steps 1 and 2 starting step 3. Using the names list obtained in Step 1 we proceed to buildup textons, one for each name, as for our example 7,570 textons of about 10,000 Web Pages each. Once the textons are stripped off from code leaving only meaningful text and images we should inspect their content document per document retrieving their potential concepts, in the average 50 per document totaling about 500,000 suspected potential concepts per subject! Note 01: take into account that a whole HKM have about 800,000 MS’s, Major Subjects. Step 4 performing: we proceed now to synthesize that huge mass of 500,000 potential concepts per subject by finding the “modal”, semantically the best, 50 in the average. Note 02: The sample expanding to around 500,000 suspected potential concepts was designed in order to study how structured, noisy, disperse, misleading and diluted from its semantic point of view a given MS is.
  • 35.
    We have nowour first version of a HKM, in our example for The Art: 7,570 nodes, going from “root” to “leaves” along 13 levels and having in the average 50 specific concepts per major subject/node, rounding about 378,550 concepts plus the skeleton 7,570 subjects that are in fact the leading concepts of the discipline under study. Steps 1 and 2 performing: these steps could be seen as a coupled semantic convolution of names and hierarchies, a sort of e-learning process of rapid convergence that started with a “semantic seed” for each branch of the HK. In an extreme we may also explore the Web without the aid of seeds starting from zero knowledge. Along our 15 years of work, four prototypes and dozens of semantic seeds buildup for third parties, we have thoroughly checked that any Web expert may identify the “authoritative” core for any branch of the HK in no more than one day effort. Specifically The Art authoritative core of about 200 authorities was identified by a human. Each of them semantically covered more than 50% of the suspected MS’s and its 2% top covered more than 95% of the suspected MS’s. Backed up by this departure knowledge we initiated a sort of “anthropic algorithm”: a human expert in multi agents programming, learning as much as possible by himself and guiding and/or adjusting agents work, a sort of a man machine cooperative gathering selecting pieces of information and knowledge jumping from link to link within the base cluster and from it to semantic neighborhood clusters, expanding the initial base. How to check that something goes wrong along the exploration: This anthropic process evolves too fast suggesting us that probably logic attracts logic tending to empower it and to weaken the illogic. We also have a mechanism of easily checking what’s poorly structured from a semantic point of view: the ontologies. Effectively any ontology enables us to check something that already exists and to answer questions such as: was it created or not under ontology conjectures? Notwithstanding ontologisms aid us nothing about the “art” of creation of things. For instance it tells nothing about how to unveil – or in an extreme to invent- a logical skeleton of art but on the contrary it warns us if something goes wrong. HKM buildup Schema The figure above illustrates a core part of the intertwined process of 4 steps. From 1 to 4 we depict at left the four big steps looping embedded. In the middle we depict a KT, Knowledge Tree of 15 nodes from root to leaves. A HKM Human Knowledge Map resembles a logical forest of 200 trees, one for branch of knowledge and each tree having in the average 4,000 nodes (Major Subjects) totaling a sort of World Encyclopedia of 800,000 Major Subjects for a given pair language culture. At right we show a huge matrix used to check whether or not semantic hierarchies ideally corresponds to logical trees: nodes 1 and 2 derive from unique ancestor 0 (root), nodes 3, 4 and 5 derive from unique ancestor 1,……, and so on and so forth.
  • 36.
    Now you haveto imagine that “within” any node is hosted metadata and sets of concepts and images specifically related to its subject. So the mass of concepts of a given branch of knowledge, usually more than 95%, is hosted in the nodes! The 5% left corresponds to the names of the MS’s (main subjects, main themes and topics of the discipline under study). At right we depict the MS’s versus itself, nodes versus nodes. If these MS’s are structured like a Logical Tree the matrix would be practically empty. Its unique ancestry nature enforces the “x” as shown. However in the real Web all type of abnormalities proliferate like the ones in yellow. This is one of the hardest Darwin tasks HAZ, Hierarchy Abnormalities Zoning. => Back
  • 37.
    Some Darwin Challenges ABig Data Challenge as well Juan Chamero. as of January 5th 2015 Challenge I: Darwin Concepts Unveiling from Textons Darwin retrieves suspected potential concepts out of textons, documents however semantically “noisy”. Talking of a 10,000 Web Pages textons sample we mean extracting their 10,000 corresponding semantic profiles (Darwin “fingerprints”). Texton: a large enough string of meaningful documents supposedly dealing with the same MS, Major Subject, for instance “modern theatre” within “The Art” branch of knowledge. The location within the string for instance from its beginning to its end and from left to right is directly related to its semantic significance. Textons usually have from 1,000 to 10,000 documents (Web Pages  URL’s). Note 01: this is a strong supposition that must be checked along the Darwin concepts unveiling process. The “raw data” is provided by conventional search engines like Google that “rank” URL’s as per their own criteria. This primary ordering that is taken into account by Darwin is continuously checked and enriched because it performs an exhaustive analysis about how all textons deal with subjects and rank conceptually between them!. In common words Darwin detect that some sources (URL’s) may provide more and better information to us than allegedly supposed by its Google rank that is to say they behave as specialized ones! Textons corpuses must be stripped off from “no content” information, like all type of coding before their processing. Textons are the raw data of Knowledge Maps, at large a semantic sample of it however not yet structured: hierarchically “flat”. Darwin as a process following its own Ontology Conjectures unveils in its turn the potential concepts hosted here and there within textons as a function of their intrinsic statistical “rareness” Textons unveiling Texton [pair LC; SMS; n; W]: pair Language Culture, Suspected Main Subject, amount of documents, amount of words], for instance Tx231 *EN USA; “modern theatre”; 12,324; 15,355,409+; Read as: Texton 231 for the pair EN USA (English – American), dealing with “modern theatre” having a string of 12,324 Web Pages corpuses and 15,355,409 words The key is to detect potential concepts or “cepts” as a function of their rareness along the following steps:
  • 38.
    1. Jargon confirmationand coherence tests performed on it, f.i.: EN - USA Art Jargon of ~4,000 terms; 2. SJD, Semantic Jargon Distribution: Statistical Jargon words’ presence within textons; 3. SJD rareness; 4. First potential single word concepts and/or “cepts” (c’s) List; 5. n-ads potential c’s Frequencies Database creation; 6. 1-ad presence and from 2-ad to 6-ad potential c’s presence distribution within texton: f.i.: the four words {the backwoodsman of Kentucky} in allusion to Abraham Lincoln generates from 1-ad to a 4-ad: [the; the backwoodsman; the backwoodsman of; the backwoodsman of Kentucky]; 7. HUMAN defines, adjust rareness thresholds; 8. Lists of potential c’s with their justification parameters; 9. Semantic checking of potential c’s “names”: searching their “modal names”; 10. Semantic checking “ex-post” of MS versus the checked potential c’s: do these c’s represent semantically the initially supposed MS?; Once finished this process step for a given Major Branch of the Human Knowledge, for instance “The Art”, we may say that we have unveiled it completely but still unstructured, flat, as a huge logical tree of only one level! Talking about The Art thesaurus of 7,570 nodes, each one corresponding to a single Major Subject of the Art discriminated by about 400,000 c’s. Once structured, the last global Darwin step: these 400,000 c’s show as structured in 13 levels! Challenge II: Semantic Synthesis via Textons Processing Given a cluster of documents supposedly dealing with the same subject unveil from it the best fit to its specific set of concepts (ideally its “subject semantic fingerprint”). This is one of the strongest Darwin conjectures that globally stand for: humans tend to register their ideas statistically following secular rules (see Darwin Ontology) generating by de facto “WWD, Well Written Documents”. So Web pages dealing with the same subject spin around these ideals like semantic vortexes: being the internal the best documented meanwhile the externs the worst.
  • 39.
    We humans arespecially suited to unveil those specific concepts (see Darwin history) as a function of how good a document is concerning the ideal: from our experiences with hundreds of advanced students of Informatics and Systems Engineering a human in a couple of hours could be trained to detect specific concepts within documents chosen at random about any subject. This “methodological talent” could also be easily transferred to an agent (see How Darwin unveils potential specific concepts out of a document). For each Th Semantic Threshold Level of “rareness”, in the figure above Th03, Darwin algorithms unveils a specific set of 46 potential/suspected concepts supposedly pointing to the MS, Major Subject, of the texton analyzed. If the texton have 10,000 text corpuses pertaining to their corresponding 10,000 Web pages we would unveil for instance 500,000 potential concepts names for an average of 50. In fact a Big Data scenario where in the average for each MS and for each Threshold we should define the “best fit” to the “specific concepts set” used statistically worldwide for a given MS and for a given pair “language culture”. Going a little deep on the details: from these 500,000 [URL, concepts] pairs, only for a given MS, we must find the best fit to a sort of “modal” “specific concepts set” of it. To perform this task we may need the following structured data: [MS, URL, {code, name}, frequency] where MS stands for Major Subject, URL by the Web address, {code, name} set of pairs (code, potential specific concept name),and the frequency of the potential specific concepts appearance within the page. Given a MS of a branch of the HK the challenge is to unveil out of the Web the best fit for the “specific set” of concepts semantically related to it, namely the set of concepts specifically used – probabilistically - in WWD, Well Written Documents. In numbers for a branch of knowledge, for instance “The Art” (without considering frequency): 4,000,000,000 names  [8,000 subject names per Branch of the HK x 10,000 Web Pages per subject name within each branch x 50 potential specific concepts per Web Page ] Note 02: This Big Data briefing accounts to have an idea of the order of magnitude of the computing needs. This will provide us the upper threshold level, almost a “brute force” reductionist approach. However as it occurs in most Big Data processes we human learn fast. In the examples above we may pass from a first trial of 4,000,000,000 names processing for a single subject of a HK branch to no more of a few million as long as we go from one MS to another, for instance from 4,000,000,000 to less than 4,000,000 with an average of 40,000,000. What really happens is that from the designed 10,000 Web Pages capture per MS we may “discover” among those 10,000 URL’s a small sample of a few hundred of “authoritative” URL’s concerning the MS under analysis. What’s then missing? Only two things: iii) How do we unveils all the subject names of a given HK; iv) how to structure those subject names along a unique logical tree. => Back
  • 40.
    Darwin Methodology Applications ByJuan Chamero Principal Architect of Darwin Methodology, as of January 11th 2015 It is a dreamstime.com free use image. Search thru Google Images via query [teacher clip dino] as open search. Thesaurus is a sort of reference book of concepts, and in Darwin Ontology of “in mind ideas” represented by one or more words of a given pair language culture usually with synonyms and sometimes with antonyms. Thesauruses may suggest the best suited synonym for a given moment (present) named as the “modal name” of the referenced “in mind idea”. The Web imagery used to associate thesauruses with dinosaurs. It is a vision of the upper levels of a HKM, Human Knowledge Map of The Art. See a sample extraction of it sheets 1, 2 and 3, (Theatre Mapping) as of September 2008 from Spain. HKM stands for mapping the whole knowledge or a branch of it, semantically, by meaning, resembling Logical Inverted Trees from their “roots” down to their derived “nodes” thru a unique ancestry. Its content and the “intelligence” behind it is detected and unveiled from the Web by Darwin Methodology under the guide of Darwin Ontology, a set of “strong” semantic conjectures about how we, humans, document our “in mind” ideas. Non intrusive e-membranes like a sort of intelligent interface among two or more autonomous applications. Each one provides some type of service to the others under a non intrusive operation scenario. See BBC Science as of July 2003. Non intrusive and non perturbing e-membranes are necessary interfaces to communicate two different “words” that enable both to continue working autonomously, within their own hierarchy and rules and without perturbing each other. The e-membrane designers must have into account that the communicated systems may not only differ in objectives, times, rhythms, but also in their semantic. In the example published by the BBC of London the figure refers to an e-membrane between a conventional procurement SAP system (for an international oil corporation) and a
  • 41.
    “pilot” e-procurement systemworking in parallel with the conventional one. The purpose of the e- membrane was to learn as much as possible in the less time about real e-procurement instances. SSSE, Super Semantic Search Engines, Semantic Direct Search Engines, also YGWYN – IOOC Search Engines, You Get What You Need In only One Click Search Engines. See also “Súper Buscadores Semánticos (i) y (II)”. Initially the main purpose of Darwin was the creation of a Semantic Search Engine that enable users to find in the Web the best information about something they need in terms of information and/or knowledge whether possible in only one click. The icon for this Darwin application was selected from “How Darwin unveils concepts”, see below. Conventional Search Engines like Google are like a non semantic library where all existent Web documents are classified by their words. Google tells you nothing about the meaning of Web documents. Darwin Methodology unveil documents meaning at a given moment by structuring them semantically and building the Web Thesaurus, resembling a World Wide Library, depicted in the figure as a hypercube of as many “floors” as semantic levels the Human Knowledge has. Encyclopedias building should be one of the first semantic areas of necessary Web applications but unfortunately its development is almost frozen. There are exceptions like Wikipedia, and projects about specific thematic subjects like Europeana and Wolfram Alpha. (See Videos Google: World in hand). By first time we, humans, have at hand all the records of our living, our past and now our present practically at real time! Encyclopedically talking we have at hand (of course with the appropriate technology) all the pieces of information and knowledge about anything. With applications like knowledge mapping (the second of our list) we may locate “directly” the best authoritative sources practically about anything. The only remaining task (by now the human touch of direct
  • 42.
    knowledge retrieval) isto “synthesize” and “edit” the content of those sources meaningfully and automatically, namely via a conventional computing process. Darwin Methodology is pursuing this goal. In the interim Darwin may deliver to humans all the content they need specially suited for editing. Surveys and Polls: This icon corresponds to one of the classic applications: stats focused in Surveys and Polls. See all types of visualizations in Google Images as: [surveys & polls results]. See image as applied to Health Science Strategies. Darwin faces it singularly because having at hand information about anything even without needing being aware of addressed people, entities and avatars. We may obtain meaningful answers from querying to all types of just “observing” non intrusive procedures. We may also have at hand “massive” information at any time and intervals of time about any specific aspect of the research and the possibility of filtering observations thru causal cultural behavior models as well. Intelligence Reports: This icon represents information and knowledge unveiling and we used it as an avatar of “intelligence”. Darwin stands for Distributed Agents to Retrieve the Web Intelligence supposedly disperse and hidden. Intelligence: take a look at Intelligence in Google Images and appreciate the dominant imagery we humans have about this subtle concept: light, luminosity, tending to be blue and expanding. In the data evolution from chaos to information to knowledge to wisdom the thing or “thing” that enforces evolution from chaos to wisdom is the intelligence. Intelligence Reports are documents that enable us to “infer” results and consequences, trivial and sophisticated ones, explicit or hidden out of a structured and as much detail as needed description of something existing. Darwin enables us to create trustable Intelligence reports about ANYTHING as long as we have at hand enough trustable information and knowledge about that ANYTHING. As in the case of Stats Surveys and Polls, these reports could be performed at non intrusive mode and without disturbing the Web. I is an icon avatar from the film Avatar. Avatars are creatures and or entities that represent existing entities, persons, beliefs, truths, etc. See avatares (Spanish) Pope Francis demo, avatar(computing).
  • 43.
    Avatar is anabstract entity and/or a digital or Web creature that intents to represent: a given entity, person physical or juridical, the best way scientifically taking into account all possible meaning axes of its “character” and facts even those considered good or bad, sayings of all types from beliefs to conspiracies, researches performed on it for instance: the Pope Francis or Barack Obama, or the Organized Crime in the World, etc. I-Webs, Intelligent Web Portals: icon to represent how Web connections evolve along time for the pair information people. See Explaining the Semantic Web and Making sense of the semantic web. Be cautious! Most of these projections that intent to see the Web extrapolated to year 2020 should be considered possible trends, among many, of a phenomenon that evolves exponentially so fast and unpredictable that may mislead seriously our forecasts. I-Web is a Website (or Web Portal) intelligently designed taking into account the available technologies and the available resources of the Website owners and administrators. We may found today excellent Web 1.0 Websites and from poor to awful Web 3.0 Websites. For instance there are thousands of “top ranked” Web 3.0 Websites with their own proprietary data still semantically unstructured. In any buildup the beginning is the beginning and the beginning in all Web projects should always be its “semantics”, namely the “semantization” of its data and vocabulary. The Web creatures live in different habitats and under different technologies concerning their abilities to communicate with others meanwhile performing their daily tasks in order to survive either tagged as Web 1.0, Web 2.0, Web 3.0 and now very recently as Web 4.0. Concerning Web development we have as options a paraphernalia of software panaceas and tools that are usually applied to Top Websites and Portals and that have strong prerequisites to work successfully such as Distributed Search, Cloud Services, Mobile Interfaces, Social Networks interaction, Privacy Protection, Top Down and Bottom up design, Big Data Apps. Implementing advanced applications without satisfying those prerequisites is suicidal. Notwithstanding we may transform any Website in an I-Website without trying to become it Web 3.0 or Web 4.0. First of all as commented above we must semantize as much as possible data, vocabulary, and naming. Then updating and or replacing of programming platform, languages, reviewing databases structures having a horizon of planning of not less than a decade. Big Data Synthesis:. Semantic Web needs of Big Data; it’s intrinsic to its nature. Its market is enormous and growing at a pace of 10 percent a year without taking into consideration yet video and audio content. From de facto Darwin works within Big Data scenarios and have an extensive experience on it with proprietary procedures and algorithms. The image icon has been
  • 44.
    selected from oneof the typical and oldest Big Data applications. See original image of this impacting Big Data application at CERN Server, Switzerland (LHC Large Hadron Collider). As Professor Mark Whitehorn says “Big Data may be misunderstood and overhyped - but the promise of data growth enabling a goldmine of insight is compelling. Professor Mark Whitehorn, the eminent data scientist, author and occasional Register columnist, explains what big data is and why it is important. And adds: “Data is not large and it is not small It does not live and it does not die It does not offer truth and neither does it lie.” In our humble opinion BD always existed. We do not believe that it is a new data dimension but something that at a given moment of our knowledge presents as rare, too big and complex. Let’s clarify within our Darwin Methodology: 40 years ago matrix processing was bound to a volume of data 100x100, something that holds in an Excel sheet of 100 columns by 100 rows. Why? Because rounding errors propagation. Imagine now volumes of data generated by some social networks about the order of 1,000,000x100,000, for example applied to a behavior study of 1,000,000 persons related to their opinions about 100,000 themes! In order of not over dimension the resources and tools to face this BD challenges we need fundamentally experience and common sense (see our document about Big Data Challenge). : Autonomous Artificial Communities: as of today is easy and not too expensive to create this type of communities. They could be a virtual and idealized emulation of a real one to be tested its evolution (see artificial islands in Lonely Planet and Laulasi Islands as per Wikipedia). These communities could be populated by persons, avatars, agents and a combination of all them. Searching in Google by [artificial communities] as open search you may find Synthetic Microbial Communities, Google AI Communities, Artificial Reefs Communities, Planned Communities, Artificial Foraging Ant Communities, Artificial Plant Communities, Artificial e-Learning Communities, Artificial Fresh Water Protozoan Communities, etc. => Back
  • 45.
    The Web as“seen” by Darwin Methodology By Juan Chamero, from Buenos Aires, as of 5 th of November 2014 The Word of People Humans have millions of ideas in their brains Hidden and elusive except by proper use of language Figures below illustrate very simplified how to build via agents a HKM, Human Knowledge Map, from a “semantic seed”. Darwin states that in the Web is “always” hosted the sum of the knowledge even though unstructured and disperse in approximately 35,000 millions of Web pages as of today. Darwin Ontology states that we humans keep (save) in our brains a finite universe of “in mind” ideas estimated in 12 to 20 million per pair “language - culture”. This cognitive asset has being coined thru millions of years and documented since the writing discovery. However this asset looks like hidden and elusive. The only way we have at hand to retrieve those ideas is thru the “correct use” of language within the right context, for instance the word Rigoletto point “correctly” to the “Opera Rigoletto from Giuseppe Verdi within “Performing Arts” context and within Opera context staying semantically differentiated from hundreds of acceptations and/or frequent uses of the same word as for example a commercial brand or a restaurant. Darwin Ontology was conceived to retrieve information and intelligence out from big data reservoirs somehow semantically structured (probabilistically). However the Web content is only indexed by “words”, not for ideas or concepts, being considered “semantically unstructured”. In order to “see” its content as if it were semantically structured Darwin query it as if it is!. It uses as a valid stratagem the following set of suppositions: 1. ALL: Web Completeness: The All is present in the Web notwithstanding disperse and hidden; 2. STRUCTURE: Logical and Probabilistic Completeness: The All is “probabilistically structured” under logic algebraic forms; 3. CREATURES: Web creatures: from this structure arise dominant ideas (in mind ideas) humans use to communicate between them; 4. NAMING: Modal names: dominant ideas have specific names, “unique” and dominant for each pair language culture; 5. DOCUMENT: “The Word of People”: all Web documents are expressed around these ideas just by using their specific names, their synonyms and/or their distortions; 6. TREE: “The Word of People” structure: dominant ideas are hierarchically structured tending to evolve and conform as inverted logical trees; 7. FUZZYNESS: the nature of this structure is probabilistic and also its math logic; 8. EVOLUTION: This structure evolves: it evolves fast along time from seeds and/or graphs of words and concepts pertaining to diverse disciplines. New disciplines are continuously created and some others disappear. New branches and new concepts are cognitively detected and assimilated; 9. ANTHROPIC: Content and Structure could be retrieved: even being the Web space open, practically unbounded, and continuous could be precisely mapped as HKM, Human Knowledge
  • 46.
    Maps notwithstanding itsdispersion and invisible “structuration”. This feature enable humans to continue thinking and evolving as “always”: freely, not too much structured , neither tied up to forms nor to extra semantic inflexibility; 10. ANTHROPIC: Web Thesauruses: These maps will catalog the unstructured Web probabilistically, notwithstanding open, multilingual, free, in continuous evolution, and even more they will enable us to “see” any Web Page conceptually and to compute an approximation to what would be its “metadata”; Darwin versus a Conventional Search Engine like Google SE models ALL Structure Creatures Naming Document TREE Fuzziness Evolution Anthropic Anthropic Agree E Words Apathy Apathy Apathy Apathy Agree Apathy Apathy Agree E Ideas Specific Specific Specific Specific Agree Crucial Crucial This table highlights how Darwin and Google “see” the “Web Ocean” from a semantic point of view focused on its structure and its “creatures”: Darwin sees it like semantically structured (or may see it as it were thru a map) meanwhile Google sees it as unstructured; Darwin sees the Web like a huge ocean where “ideas” live meanwhile Google sees it as a huge ocean of “words” instead. In the figure above we see an idealization of “The Art” semantic seed built year 2009 to be the nucleus corpus demo of the projected European Union Search Engine (Theseus). This seed, once checked its semantic coherence, was made grown by a Darwin anthropic algorithm were the massive “Big Data” type computation was performed by agents and crucial decisions performed by humans: from status [40, 0] 40 branches, 0 specific concepts (almost a Zero knowledge condition) to status [7,571, 370,000] 7,571 branches or Major Subjects of “The Art” and complemented by 370,000 concepts, about 50 specific concepts and dominant names per branch in the average. Updated as of today, October 21st 2014 where G renders 497,000,000 Web pages for exact query “the art” the same seed would grow approximately to status *8,000, 400,000]. Darwin checked the 10 Web premises accomplishments as well their associated 10 Darwin Ontology computation conjectures and started a mega process of 82 steps reviewing semantically hundreds of
  • 47.
    millions of Webpages to finally synthesize The Art Map. These maps may evolve either by themselves or under human control tending to its structural equilibrium, generally as a perfect and inverted “logical tree”. The Web metalinguistic premises could be described as follows: The Web is a complete entity logically structured (1) and (2). Its “creatures” are abstract entities hosted in an immense and practically unbounded “Web Ocean”: what classic philosophy considers “concepts”. Web pages are human documents, concrete entities (3) thematically dealing with ideas identified by their “dominant names” (4) for each pair “language culture” via a literary knowhow coined throughout millennia. This millennia knowhow (5) consists in describing objects, ideas, topics, and justify beliefs and truths by using “minor and/or derived and preexistent ones, creating new ideas with a minimum of effort. This derivation evolves to an arboreal structure (6) of hierarchical meanings: from mother ideas derive sister ideas and link minded ideas tend to group under a common ancestry. The nature of this unconscious and collective structuration is probabilistic (7) evolving continuously (8). Under these premises the best we can do to “see more and better” the knowledge hosted in the Web is to map it for a predetermined level of resolution, for instance a Web Thesaurus certifying that their names are modal with a probability of 99% for the pair “English American”. Techniques like Darwin in a second step may go deeper and excelling trying to make the Web more semantic building for each page of it a suspected metadata and even dare to make descriptions and issue opinions but that would be a contradiction to its premises. Under this spirit Darwin does not suggest content (9) but strive to facilitate and make efficient the search of information and knowledge saying humans: Here in within this immense Web Ocean we guide you to find the best Web pages (a few most times retrieved via only one click) dealing authoritatively and semantically with your “in mind” ideas. Note: The spirit of this guidance, from the beginning of the Web creation was within the “up to you” meaning: Darwin agents select what considers the best pages but inspecting and/o analyzing them or on the contrary rejecting them is up to you. Finally Darwin enables us to build and retrieve Web Thesauruses, and Human Knowledge Maps (10) that as aged wine improve and essentiate by itself. And what it is perhaps more important: it permits us to build trustable and high quality Intelligence Reports about almost any subject. Global Vision: An open Web, created by and for humans, free, with maximum diversity, reasonably structured, slightly imperfect even though striving for excelling, extensive us of form but not reducible to it. Meaningfulness Appendix (Google exact search) “The all”, 46,000,000 Completeness, 20,300,000 “The Word of Man”, 14,600,000 “Modal names”, 9,510,000 “The Word of Humans”, 299,000 "Dominant ideas", 101,000 “Logical completeness”, 20,000 "In mind ideas", 18,100 => Back
  • 48.
    Darwin Bibliography By JuanChamero, January 2015 E-books o Semantic Web - The Human Knowledge Map, The Web as seen by Darwin Methodology, document published in PDF and EPUB of 500 and 690 pages respectively o The Web of the People, document published in PDF and EPUB of 250 pages. o Darwin Human Knowledge Maps, a 75 slides series published by Google books. Some Websites and Darwin Links o Intag.org, http://www.intag.org, experimental Website that since year 2002 hosts the first Darwin Web Thesaurus. Its use was shared by our Web development Darwin Team working conjunctly with some Latin American universities and associated Labs. o Juan Chamero Curriculum Vitae of the Darwin Methodology Founder and Principal Architect, http://www.intag.org/downloads/. o Darwin Blog, http://juanchamero.com.ar, a primer about Darwin in Spanish and English. o Intag, Intelligent Agents Internet Corp, http://www.intagsolutions.com, Corporative Darwin Support and Rights Proprietary. o Darwin Ontology, http://darwin-ontology.org, the first e-book draft as of year 2008 about our ontology and its conjectures. It documents most Darwin conceptual changes along its 15 years of life (2001 – 2010). o Aiware, http://aiware.com.ar, experimental i-Web, Intelligent Website where Darwin Applications are promoted and tested (updated January 1st 2015). See: Big Data Pills, an experimental Big Data series (27). => Back
  • 49.
    A picture isworth a thousand words Old adage deserving 36,000,000 references in Google July 3 rd 2014 Our knowledge is based on “concepts” that are “in mind” ideas we have created and make them evolve remaining invisible for others!. The main way to make them knowable by others is by “naming” them via words of a given language within a given culture. The same idea may have many names, most times similar and equivalent, however only one is dominant at a given moment. These dominant names or “modal names” are detected and retrieved from the Web by Darwin agents. The other way to communicate ideas, from ancient times, is by attitudes, gestures and images. However we are not used to manage these forms of communication with the precision of words. On the contrary we used them to transmit “meta messages” to make others understand and see better the context where the idea exists. We are in the beginning of an era of communication via images in our daily lives. Darwin is making its first experiences on this peculiar way of communication finding that big enough samples of similar documents share not only a specific set of concepts but also a specific set of images types and traits!. Another experimental fact is that the “associated” images sets help users to open their minds in order to discover for each of them more semantic axes. The images in the left column where selected by a human trying to explain others what he understands by “images worth more than a thousand words”. See how chaotic, subjective and discriminate it looks like. Darwin agents are continuously trained and tuned up to retrieve significant samples of images for each thematic node. Note: This demo only includes Darwin images sampling for the main art themes, the ones that head changes in color tonalities of this Vision 1 mapping, namely: 0.1.1.1.1. Painting, 0.1.1.1.2. Drawing, 0.1.1.1.3. Sculpture, …..Users are enabled either to search in Google by one of the images of the gallery or either query to Google Image first to retrieve top related images. See The Art Tree Darwin demo.
  • 50.
    Darwin Methodology A secularway to build the Semantic Web And Semantic Search Engines Intro to Darwin Art Map Juan Chamero, as of January 1st 2014 Darwin Art Map, Vision 1, go to demo Darwin Art Map, Vision 2, go to demo Intro to Darwin Map: Please read carefully the intro that follows postponing the understanding of the two images above that will guide you to navigate “The Art” region of the Web thru a Darwin demo. You have to be aware and to understand two things first: a) that the Semantic Web does not exist yet and; b) why do we need maps. As you go deep this intro, from time to time take a glance to the figures. A tip: try to find an analogy with the bonobo keyboard story below.
  • 51.
    The Semantic Web:Along 13 years from the end of the “dot com Era”, more precisely since the “dot com bubble” collapse, we (as Darwin Team) were trying to find a way to go to the Semantic Web. Of course most of you believe that we are immersed in it but that’s absolutely wrong: The Semantic Web does not exist yet! Most Web data are semantically unstructured and only indexed by words not by “meaning” and you all know the difference: meanings are not words, but ideas expressed in each language by a precise words or strings of words and/or symbols culturally agreed along the time resembling ideograms. The Semantic Web Illusion: However actual conventional search engines are so powerful that awakens in us the illusion the Web is semantic. A real Semantic Web will enable humans to communicate among us - accurate, meaningful and fast- directly in any language and to retrieve from the Web, also directly (in only one click), any piece of information, knowledge and intelligence. Kanzi: a big ape (bonobo, a pigmy chimpanzee): a semantic bridge towards the Semantic Web See also “Apes with Apps”, by the IEEE Spectrum Darwin Methodology enables humans to “see” the Web like structured by meanings based on an anthropic ontology (Darwin Ontology) that take into account how we humans express our ideas thru text, images, sounds, tactile sensations, or ideograms like Chinese and Japanese ones. Google Hummingbird disclosed: Google, the most advanced actual conventional search engine have recently launched its new search algorithm named hummingbird because it behaves - precise and fast - just doing the best we humans know about the art of Artificial Intelligence and Computing applied to the Web. However it is not well suited to communicate ideas because it is still based on words an inferior semantic category that ideas and concepts. Kanzi the bonobo: We are going to introduce an astonish research about communicating at human style via ideas with Kanzi a bonobo (see figure on top). This research and Darwin applications imply in fact the use of more advanced technologies that the ones used by conventional search engines like Google and Yahoo. They only need time to broadening their reach and extend their use because they add a new dimension: meaning instead of pieces of it.
  • 52.
    The paradox: thebest ways are sometimes hidden in humble scenarios: The differences between Darwin and Yerkish an artificial language developed by Georgia State University for use by non human primates thru a keyboard with “lexigrams” representing ideas are basically quantitative: Darwin manages about 15 millions ideas per language meanwhile Yerkish has no more than 600 but both work on ideas. Take into account that the actual urban glossary of our children has no more than 1,000 ideograms (expressed by words). As we are going to see in our demo about Art Map the users at large communicate thru a sort of virtual e-membrane keyboard of 7,570 ideograms. Of course in both cases: the bonobo and humans must acquire a basic skill. Experimental Kanji 128 lexigrams keyword, see Art for bonobo hope See also The Darwin Art Map of 7.571 “keys” (ask for a login) What do the SEO actors say? Successful SEO, Search Engine Optimization professionals and companies agree that pushing Websites on top of search engines outcomes based fundamentally on popularity (link wise) is an old model that proved to be a solution when there was not many answers about a theme but now is different: answers proliferate by millions and at the same time users know that somewhere “up there” exist at least a document that satisfy their queries. Efforts to locate the right answers are not only justified but from now on become a necessary condition. Source: The future of search: A Window to knowledge
  • 53.
    The search bykeywords is collapsing: The Peter Principle: The figures below express a synthesis of many Web future visions about this possibility performed since 2008 when the problem of Big Data was still not perceived and unimagined. Only recently begun to appear essays and articles daring to forecast the end of keyword searching, its collapse that would affect a very significant and global market perhaps the biggest one within Internet. Half seriously and half humorously the Peter Principle asserts that physical and juridical persons tend to progress (or to be promoted) until they reach their position of maximum incompetence. Many outstanding ICT corporations are close to that critic point: a Web pollution of too much misleading, noisy and nonsense words!; a Web of too much alchemy!. Source: Nova Spivack, a technology futurist (May 2008) Source: Peter Principle, by futurepredictions.com;
  • 54.
    Other efforts alongsame or similar lines - at large how to build a Web Thesaurus -: o Google trying to unveil what people say: global pseudo semantic approach; o Yahoo trying to enter into semantic search: semi structured semantic experiment; o Wolfram Alpha improving in direct search: on highly structured data; o WWF trying to structure their communications: semantic approach, only experimental; Hummingbird, “precise and fast”, the new Google Search Algorithm Google unveils (Nov 6 2013) its Hummingbird algorithm Yahoo Lab teaching Semantic Search (July 23rd 2013) in Bangalore; Yahoo Semantic Search Tutorial
  • 55.
    Wolfram Alpha extendsto mobile A multilingual semantic experiment performed (2008-2010) by CORDIS EU => Back
  • 56.
    Darwin Methodology Mapping: Tosee more and better the Web By Juan Chamero, Darwin Principal Architect, as of 21 st of February 2014 A map is a visual representation of an area – symbolic depiction highlighting relationships between elements of that space such as objects, regions and themes. Along centuries we humans made use of maps and mapping to see the world more and better. Unfortunately we did not start to use them to see more and better the knowledge still hidden in the Web, simply because the web has not been semantically mapped yet! Darwin may do that building precise maps of knowledge to perform three basic and crucial surviving and evolving activities: 1) To Learn; 2) To navigate thru knowledge for fun; 3) To search what we need. This brief report intents to be aware ourselves about this peculiar loss of historical continuity via an images sequence from Late Neolithic times to present talking into account that as of today the Web is only indexed (mapped) by words searching what we need by “guessing” (activity 3) and exercising in a very limited way activities 1 and 2. 1. Late Neolithic: from Late Neolithic humans used maps and mapping depicting them in stones, caves and trees. Neolithic mapping, as of Eupedia
  • 57.
    2. Altamira Paintings:These maps were the “knowledge maps” of those times Altamira Paintings, as per Wikipedia 3. Medieval Times: “T-O Maps” and “Mappa Mundi”: as of circa 1300 depicting the world seen at that time, the Earth (T for Terrum) within incommensurable circle (O for Orbit). The image below depicts a Tripartite Hereford T-O version with Asia upside, Europe down left, Africa down right and Jerusalem in the center. It measures 158 cm by 133 cm some 52" in diameter and is the largest medieval map known to still exist. Focusing into details some well places like the British Isles are reasonably well described. The Hereford Mappa Mundi (~1300), depicting Jerusalem in the center
  • 58.
    4. Today -DNA Mapping: the image below shows us two DNA visions. At left Gene map of the human leukocyte antigen (HLA) region. The major histocompatibility complex (MHC) gene map corresponds to the genomic coordinates of 29 677 984 (GABBR1) to 33 485 635 (KIFC1) in the human genome build 36.3 of the National Center for Biotechnology Information (NCBI) map viewer. At right DNA replication is the process of producing two identical replicas from one original DNA molecule. This biological process occurs in all living organisms and is the basis for biological inheritance. DNA is made up of two strands and each strand of the original DNA molecule serves as template for the production of the complementary strand, a process referred to as semi conservative replication. A tiny HLA region of the Human Genome DNA Replication 5. Today - Brain Mapping: The figure below depicts a mapping of both sides’ main brain functions by HiddenTalents.org. These maps are clickable: if you make click on “grammar” (Left side) you obtain the following information: Grammar Grammar is the spatial sense of vocabulary. This is especially true of English, which developed a relatively simple grammar system that depends upon spatial order much more than endings or gender. In English, we have grammar in our left brain that knows "Boy chases kangaroo" is different than "Kangaroo chases boy." We could also draw pictures in our right brain to symbolically say the same thing:
  • 59.
    Left brain words= "Boy chases kangaroo" "Kangaroo chases boy." Right brain images = As a child grows, the brain soaks in whatever sounds it hears which we call vocabulary and grammar. After age 10, the vocabulary and grammar parts of the brain are mostly finished growing and the thinking parts of the brain in the frontal lobe continues growing, building upon the foundation of grammar and vocabulary learned in childhood. Vocabulary => Grammar => Concepts => Creative thinking Brain functions mapping, as per HiddenTalents.org
  • 60.
    6. Today -Dashboards (To take decisions): all kind of organizations, either public or private make extensive and intensive use of “dashboards” that depict the state of the main variables that govern a given activity and/or equilibrium conditions, namely: businesses, crisis, competitive scenarios, Web traffic and uses, people behavioral trends, and conflicts, at large forms of Intelligence reports to guide and optimize our decisions. Example 1: SquizLabs Example 2, a Healthcare dashboard
  • 61.
    Example 3, asper Microsoft aids Example 4: a dashboard to locate d-experts 7. Today - Defense: in a Web where EVERYTHING is connected with EVERYTHING and EVERYBODY is connected with EVERYBODY at extreme detail: opinions, emotions, impressions, and even our gestures, it is perfectly possible to scientifically unravel possible futures of any scenario no matter its complexity. Among top reach and complexity are “Defense Mappings”. The image below depicts the China Military Power as of 2009 as per the prestigious Map Collection of Perry - Castañeda Library at University of Texas.
  • 62.
    Defense Mapping Example1: China Military Power as of 2009, UofTexas 8. Today - Crime Mapping: Maps of crime for any type, place or situation, for example in the figure below a demo for Brent Borough, London UK, by Instantatlas.com. Example of Crime Heat Mapping
  • 63.
    9. Today -Thematic - Nat GEO: combining the power of mapping and interactive control we may see below a “Nat GEO” (National Geographic) educational Website, in this example learning about “Physical Systems” - Water - Ocean Surface Currents and Light at Night onto clickable maps to focus our interest. We invite you to use these “open and free” outstanding educational facilities.
  • 64.
    10. Today -Google Maps: The use of geo maps has extended universally, for all ages, genre and socioeconomic condition thanks to the Web and specifically to search engines like Google and Yahoo. Google maps are open and public and as such even their best approximation should be considered of “medium resolution” where a land square of 15 by 15 meters is represented in one pixel. Taking into account that satellites networks could “inspect” the earth surface at 0.6 square meter resolution per pixel (and deeper than that but this data is publicly unknown) we may imagine a world structured in levels of something equivalent to “security” where possibly some organizations like the NASA are enabled to monitor the world at “level one” meanwhile some others like Nat GEO and Google at “level two” and you, we and all “common people” at “level three”. In the specific case of geo mapping there are level one providers that probably manage a superior “level zero” but limited to very specific themes, like for instance TrueEarth with its TerraMetric vision and derived products. TruEarth® 15-meter imagery is the baseline for global, natural-color, Earth imagery. A complete, 3.6 Terabyte, global mid-resolution dataset, TruEarth® 15-meter imagery is ideal for web mapping, simulation, visualizations and GIS applications. TruEarth® 15-meter imagery provides complete, best-available, substantially cloud-free, global coverage (except Antarctica) of the Earth at 15- meters-per-pixel resolution. New Trends in Global Earth Mapping, as per TruEarth Epilogue: Darwin may map all these examples unveiling available data from the Web. As always the beginning is the beginning: In order to perform 1) learning; 2) knowledge exploring; 3) semantic search we need first to Map the Web. Darwin does it! => Back
  • 65.
    Semantic Web Darwin SemanticSearch Instructive demo by Juan Chamero as of 1st of January 2014 Semantic Web: Before introducing you in Semantic Search you need to know first the meaning of Semantic Web and of Web Semantics as well. The only precisely defined term is the first, coined by the Web creator, Tim Berners Lee: also The Web of Data as synonym of Semantic Web instead of the actual Web of Documents (Web pages). Web Semantics is rather ample and ambiguous: Methodologies, tools, agents, programs to “see more and better” the Web and to structure it meaningfully. The Web as of today is then a Web of documents, semantically unstructured, only indexed by “words”. As such we may sustain that the Semantic Web doesn’t exist yet! Respect to these crucial themes the Web of Data was idealized like a huge Web reservoir of “structured by meaning” data, still a utopia. However it is possible to build a sort of “semantic glasses” to see the Web as_is today but meaningfully structured! Darwin Methodology enable us to build not only glasses to “see more and better” the Web but to structure it gradually. They have a built -in map of the whole Web Semantic structure enabling us to locate directly pieces of information and knowledge and at the same time to use the whole Web as a Web World Encyclopedia thousands times greater with comparable authoritativeness that conventional ones. Visualization of the upper piece of the Art Map
  • 66.
    Towards Map Visualization:This demo intents to present you a small but meaningful piece of the above mentioned Web Semantic structure focusing on a single discipline of the Human Knowledge: The Art of the World in English. Any human solution has pros and cons: the Web of today was easy to implement unstructured. As a result we have today more than 35,000 million Web documents only classified by words, not by meaning. Semantically this state of chaos is partially alleviated by the aid of powerful but invasive and costly conventional search engines like Google. Another solution is the one we are proposing: a sort of e- membrane to see the Web more and better, as if it were structured, via Knowledge Maps and semantic glasses that guide us to locate pieces of knowledge instead of words. Of course this second alternative requires a minimal knowhow of interpreting semantic maps! Art Map: The figure above shows a rectangular matrix schema of a piece of our Darwin Art Map. You may imagine it like displayed within an Excel sheet of 23 columns by 326 rows. This “Cartesian coordinates” display is arbitrary only accommodated to depict to the human eye meaningfully Art trees classified as a structure of hierarchically embedded clusters. Each tonality change delimits clusters, for instance painting from drawing and fashion from cinematography. Finally we humans are used to “see” the world thru windows! Let’s make a first real size exploration of our demo. What we observe is the Darwin Art Map Skeleton, namely a rectangular matrix of 23 columns by 326 rows where each cell points to any of its 7,571 subjects or themes. Most HKM, Human Knowledge Maps are represented by Inverted Logical Trees from their roots to leaves thru derived branches and nodes. We as users may use these maps for two main purposes: a) to know as much as possible and directly from it or; b) as a precise pointer to obtain via conventional search engines, like for instance Google and Yahoo, the best information and related knowledge existent in the Web, also directly, in Only One Click. The first purpose is equivalent to study via World Web Encyclopedias and the second is equivalent to use conventional search engines like if they were SSSE, Super Semantic Search Engines. Warning: The visualization and use of these maps fall by “de factum” within Big Data. As we are going to see The Art in the World, has 7,571”big ideas” and each of these “big ideas” has related, in the average, 50 “minor specific ideas”, totaling a micro semantic cosmos of about 400,000 ideas or concepts. Let’s then focus our attention about how to present visually two “orders of magnitude”: 7,571 skeleton nodes and 400,000 concepts. Direct and Naive Vision The Art Map in full: the map as a rectangular “matrix” of 23 columns by 326 rows was our first visual approach. Each “cell” represents a branch or a node indexed “by path”, from the root in the upper left corner to leaves going down and from left to right. Let’s visualize the upper part in blue that corresponds to the first Visual Arts sub tree: Painting that starts at “Painting” and ends at “Painting market” with 211 nodes. Step 1: Mouse over any node: the figure below shows a window that prompts making mouse over any node, in this case over Painting, depicting its neighborhood: its ancestor node (Classic), 11 son/daughter nodes (descent) and 14 brothers/sisters or collateral nodes (same level). Step 2: Making click over the node name: In the image below we may see the window that prompts when this link is activated over “music” cell. Could you locate it? Be a little patient See Step 2 guide. Warning: The nodes content for this vision is incomplete. The Semantic Skeleton needs to be complemented with the 400,000 associated concepts that are not yet activated. Darwin agents have retrieved them but the process is still under revision by a team of human experts. Notwithstanding as we will see along this brief the semantic search operates near ideal efficiency.
  • 67.
    Step 1guide: Inthe figure above the node Painting, one of the main Visual Arts sub trees, has as its “father” the node “Classic” making reference to the Classic Visual Arts (to differentiate from the Non Classical ones) and 11 “sons” as you may easily check. If you go a little deep the window will show you up to 14 “collateral” or “brothers” sub trees belonging to the same Painting Tree. We may also depict in these windows the whole “neighborhood” of any node including “uncles” and “grandfathers” (it is implemented in our Second Vision). The figure below corresponds to “music” a little hard to locate within 7,571 options isn’t? Imagine within 15 millions!
  • 68.
    Step 2 guide: i)Fine tune-up: As a second step users may select advanced options making click on the name: we have said that for each node of the Art Map Skeleton we may have associated an average of 50 “minor ideas”, generally from 10 to 200. These ideas or concepts are classified in two clusters: Generic Concepts of the same nature of the node subject and Object Concepts, very specific ones such as names, acronyms, dates, codes, etc. This option is actually in process of revision and activation. However only with the skeleton node information it is perfectly possible to obtain thru conventional search engines a meaningful query outcome without thematic ambiguity and minimal noise in the FIRST CLICK. ii) i-URL’s: Learning a little from the ART MAP itself: when building the Art Map Darwin agents and algorithms edits for any node a “Semantic Sample”, extracted of “suspected” and relevant authoritative sources (URL’s) in the node matters. These “briefs” (or i-URL’s or Intelligent URL’s) are pieces of knowledge that should be reviewed and approved by humans. Any Darwin Map once finished have from 1 to 5 i-URL’s. Users may have a basic learning by reading these briefs and browsing their corresponding URL’s. Warning: For your information we have included in this demo and for this vision a sample of four nodes with all the tune-up and i-URL information complete. Tip: This time try to locate “Lyric soprano”, the target neighborhood, via the search box I the figure above, making mouse over “Lyric Soprano” node (violet) as a first step prompts a window with the following information (see image below) That tell us that Lyric soprano has Soprano as father and two sons: Light Lyric soprano and Full lyric soprano. These four nodes have their second step information complete as you may check it and it is shown below.
  • 69.
  • 70.
  • 72.
    Light Lyric 1and 2, no 3
  • 73.
    Full Lyric 1 iii)Search by images: For any node of a map (in this demo 103 of the upper Art Map levels, Vision 2) Darwin agents select 5 images supposedly strongly related to the nodes themes. These images could be used to add meaning and reliability to the search via a sort of “Inverse engineering process”. You as a user may access complementary semantic information making click on any of the images sources. We are going to explain you why: Darwin Methodology is based on Darwin Ontology that tell us that we humans when document something, for instance a intentionally written Web Page, follow at large statistically, certain forms and protocols of a coined along centuries “Established Order”. On the contrary, or better said complementary, when someone queries directly the Web thru images via a conventional search engine outcomes guided by “people’s preferences” are obtained instead. We have checked our demo gallery (515 images) using Google Images with 95% of successful matchmaking. It seems that even Google captures our reduced size images version (300 pixels wide). iv) The ART Map in full search box ( see the image below): the search box in the upper corner may be used to locate nodes that have within their corpus certain words: for instance “music” is present in 74 nodes, “lyric soprano” in 1, and China in 26, however all forming part of different concepts. The nodes are activated in red. This box will acquire its maximum of utility when the whole Art Thesaurus is implemented. Take into account that a Web Thesaurus is a list (Controlled Vocabulary) of all concepts belonging to a given discipline. The Art Map in full will have about 400,000 concepts structured along the Art Map Skeleton discussed here. Warning: once the skeletons for all knowledge branches are extracted from the Web as_is the retrieval of the whole Web Thesaurus, about 15 millions concepts per language, is reduced to a Big Data computational problem: in terms of effort no more than a few weeks of parallel processing. The semantic skeletons are the intelligence that maps information into knowledge!
  • 74.
    Node neighborhood: wehave defined the neighborhood like a “family”, in this case a semantic family of entities that share a culture and a language. To locate them in our representation scheme is not direct, we need of a script. Our rectangular representation permits two classical forms of classification: by track and by level. By track nodes along the same “track” are listed contiguously until a leave is found. As we are enforced to somehow package all the tracks (or levels) within a rectangle the neighborhood order is lost. For this reason we offer in our two visions access either to the sub tree that hangs of a node and to its family. A given node neighborhood: Let’s explore again the neighborhood of the target node “Lyric Soprano”. It is deep within the green region corresponding to “Performing Arts”. In the figure below its corresponding sub region where “Soprano” is the ancestor node and “Light Lyrics” and “Full Lyrics” are the two sons is shown. Let’s begin with the target node “Lyric Soprano” then. Passing the mouse over it you access to its neighborhood, namely (see below): ancestor: “Soprano”; brothers (5): “Coloratura Soprano”, “Soubrette”, “Spinto Soprano”, “Dramatic Soprano”, and “Wagnerian Soprano”; sons (2): “light Lyric”, “Full Lyric”; Some Trials Trial 1: Lyric Soprano: the “Wizard” agents of this demo query Google by the following string: Darwin  *"arts" "performing arts” theater opera “vocal classification” female soprano "lyric soprano"+ Obtaining 952 references, most of them authorities and semantically valid. On the contrary, not using our semantic interface, querying directly to Google by: Google  [lyric soprano] as an open search  817,000 references Google  *“lyric soprano”+ at exact mode  217,000 references
  • 75.
    Trial 2: Michelangelo: Darwin ["arts" "visual arts" sculpture history "ancient rome” renaissance Michelangelo+ Obtaining 606,000 references. Google  [Michelangelo] as an open/exact 10,400,000 references Trial 3: “Music philosophy”: Darwin  *"arts" "performing arts” music "music education" "music philosophy"] Obtaining 55,200 references. Google  [music philosophy] 230,000,000 Google  *“music philosophy”+ 675,000 Trial 1 bis: “Lyric Soprano”: (another day and adding closely related concepts one at a time): Darwin  adding kiri Te Kanawa  3 references Darwin  adding Mirella Frent  3 references Darwin  adding Victoria de los Ángeles: 1 reference Warning: the user may experiment for each node the influence of the related concepts. Generally generic concepts do not contribute too much to focus the search. On the contrary most objective concepts do. Simplified Global Vision (see “Calligraphy” on Art Map) Art Map Simplified Vision: you may see the seven main clusters and within each their derived descents (sub trees) totaling 103 clusters equivalent to equal number of books of a World Web Encyclopedia.
  • 76.
    . Calligraphy sub tree(clickable) Calligraphy Neighborhood (clickable), pointing to Chirography
  • 77.
    Node knowledge synthesis(not fully activated yet), check “image searching” Trial 4: Calligraphy and Calligraphy Chirography as an open search Darwin  [calligraphy] as ["arts" "visual arts" calligraphy] 457,000 references Darwin  [Calligraphy Chirography] as ["arts" "visual arts" calligraphy "chirography"] 776 references Google  [calligraphy] 15,700,000 references Google  [calligraphy chirography] 272,000 references Darwin/Google (via an image retrieved by a Darwin agent)  114 references Some conclusions: From our experience only using the semantic skeleton of any knowledge discipline, namely art, medicine, computer, sports, we access without semantic ambiguity and without semantic pollution to a more precise and focused universe from a dozen to thousands times small. Reinforcing our search via the concepts of the Web Thesaurus we may locate what we are looking for in only one click within a very specific, authoritative and tiny set of references. Finally we may take profit of the image search facilities provided by some conventional search engines like Google to enrich our knowledge with what people see and think about what we are looking for! Life cycle of HKM, Human Knowledge Maps: all maps, their skeletons and their conceptual content are living creatures that age and turn obsolete as time passes by. Everything from their arboreal logic to their meanings changes constantly. For these reasons Darwin Maps may evolve by themselves and more than that: strive for excellence. Darwin Maps versions backed up by statistics and reviewed by human experts are generated from “semantic seeds” created by humans and make it grow by Darwin agents and algorithms. In despite of the fact that seeds and growing are controlled by humans both humans and agents suggest changes to improve the species.
  • 78.
    Something of SemanticHistory and Use As per Darwin Team chronics from prologue to epilogue (2000 - 2013) by Juan Chamero as of 1st of January 2014 Something of History Semantic Search: How does a Darwin Semantic Search work? It proceeds rationally, as imagined before the Web Search process were universally controlled by actual Search Engines like Google and Yahoo. Let’s go back to the 90’s of the last century: The Web from its beginning in 1992 was growing so fast that all imaginable efforts to let it grow under control failed meaning to let it grow structured under the point of view of “Knowledge”. Let’s see a little what that means: books in a library are organized and “filed” by “theme”. In order to implement a basic order for each “book” or similar piece of documentation, a “book filing card” must be filled by librarians. Something of history: In its beginning the Web creators intended to implement something similar enabling to order the “Human Knowledge” by “theme”. However Web documents are not books but “pages”, generally unstructured: written without adjusting them to some basic protocols and/or something like “Well Written Document Formulae” with sizes that go from a few ordinary pages to thousands of them many times bigger than many big books and as of today in a number that astonish us: more than 35,000 million of Web Pages growing at a rate of 20% annually! The actual Web is unstructured, non semantic: In brief the Web as of today looks as unstructured as in the 90’s and perhaps more taking into consideration the “dynamic pages” written by millions each second via social networks. Web Pages are only indexed by “words” and all type of content is permitted, well or bad written, open and freely, using any jargon, acronyms, multimedia objects, with or without expressed purposes, nobody controls anything and that’s GOOD!. We are not criticizing it; on the contrary we are admired! Actually Google robots are thematically blind, they cannot know neither the thematic of a given document nor at least a meaningful dominant theme dealt with within. The invisible order is up there: Notwithstanding an invisible, indelible, basic and dominant order is “up there” in the “Web Space”. To “see it” you need a Wizard like the one we are introducing here and a built in “Web Thesaurus”, a sort of structured index of the whole Human Knowledge hosted in the Web Space at a given moment. OK, the Web is unstructured, and so what? Sometimes better than a thousand words are visual schemes and procedures. We have said that the actual Web is unstructured, and only indexed by words, and you may ask yourself: and so what? Probably you spend many hours a day exploring the Web, most times successfully and relatively fast thanks to actual Conventional Search Engines. The misleading magic of opioid keywords: In order to succeed you have to be smart enough to query by adequate “keywords”. Keywords are words or a precise sequence of words to query diligently the Web via Conventional Search Engines. Web users have learnt to get what they need in a few clicks but many times they feel disappointed because the search process took longer than expected and mislead them: for example they were to buy a theater ticket and ended buying a T-shirt o playing a War Game. Towards Semantic Search in Only One click: Darwin guide users to find what they are looking for in a Unique Click without misleading them, guiding them to unveil what most times they have deep inside and
  • 79.
    blurred in his/hermind, via a friendly and intelligent “man machine” dialog instead of “passing the ball to you” and “washing its hands” telling something like: here you have millions of references - probably many belonging to thousands of themes you are not interested at all - please choose one out of the suggested top. If you are used to explore the Web guided by these “by words” search engines you will appreciate the difference. Wizard Demo Knowledge Mapping: This demo operates as a semantic search engine for “The Art” in the world in English. This is a small piece notwithstanding meaningful of the Human Knowledge Map that would map about 200 disciplines as of today, being a discipline a Major Subject of the Human Knowledge for instance Medicine, Religion, Tourism, Security, Mathematics, The Art, Entertainments, etc. The whole map will cover then 200 disciplines that semantically spans over the Web as a “forest” of “Inverted Logical Trees” encompassing about 500,000 “Subjects” or “branches” of the Human Knowledge dealing with an estimated “Knowledge Asset” of about 15 million “concepts” per language. The Art Map is an ancient, robust and meaningful tree within this forest encompassing 7,570 branches and dealing with a knowledge asset of about 400,000 concepts per language. Finally concepts correspond to “in mind” ideas we humans have with idea as per Plato. See Semantic Web. There exists an ideal query: Ideally, sorry for the obliged redundancy, users should search by either knowing or guessing the “name” of the idea they are looking for Web references. In conventional search users try to describe their in mind ideas via keywords. What it is generally ignored - even by Google experts- is the fact that search engines like Google behaves like being virtually semantic when users either by knowledge, intuition or guided by a Wizard like the one we are presenting here queries by the right name. The mechanics of retrieving the right names: Our Darwin Wizard guide users to get those “magic names” along a smart man-machine dialogue as being a “super librarian” of a Virtual World Library hosted in the Web. In this way users may obtain what they are looking for in Only One Click. This magic is not trivial because many in mind ideas have the same magic name, something like saying that within the knowledge universe of 15 million concepts only 5 million are somehow different meaning that in the average any magic name points to three different concepts. For example “walnut oil” within the context “oil” at its turn within the context “painting media” make precise reference to a specific painting media defining the sequence [painting media => oil => walnut oil] as a concept of painting within visual arts and at their turn within the tree root “The Art”. In fact concepts are “semantic paths” of links built by specific words and/or keywords. The beginning is the beginning: Users querying guided by Darwin Wizard explore the Web as if it were a conventional search engine. By default they are invited to explore the Web via a Human Knowledge Map, and concerning this demo via the Art Map. Along this mode users may point directly to any locus like in a map - for example prompting the Art Map as a rectangular matrix where The Art themes are displayed as a DNA sequence within a Genome from up-down and from left to right. Each point of this map could be focused and magnified via a “mouse over” mechanism.
  • 80.
    Return to “ArtMap” In the figure above a particular view of The Art Map skeleton is depicted from its ”root” in the upper left corner with some subjects highlighted like Rigoletto, Light Lyric, Fantasy Novel, and Paella running up-down and from left to right. First demo search: A common mode of search is “by guessing” via a word or a keyword. The Darwin Wizard tries first to match guessing versus the map skeleton. Let’s suppose that one user asks for “Greece”. The Wizard will find the following 9 matches: 0.1.1.1.1.1.3.1., Greece: “History of Western painting”; 0.1.1.2.1.1.1., ancient Greece: “History of the Graffiti”; 0.1.1.4.10.1.1., ancient Greece: “History of Cosmetics”; 0.1.2.1.1., ancient Greece: “History of Performing Arts”; 0.1.2.2.1.1.1.5., ancient Greece: “Ancient History of Music”; 0.1.3.1.7., Greece: “History, Cultures”; 0.1.4.2.1.1.2.1., Greece: “Historia de las Artes marciales de Europa”; 0.1.5.13.2.5.1.3., Greece: “History of the “Pasta” of Greece”; 0.2.3.5.2., ancient Greece: “History of Writing Systems”; that corresponds to 9 different concepts, that’s to say 9 different “in mind” images. In this example the Art Map Skeleton will show 9 highlighted points. Lyric soprano neighborhood
  • 81.
    Neighborhood: As asecond and complementary step Darwin Wizard invite users to focus more specifically to their “in mind” ideas via not only the “main subjects” (7,570) of The Art Map skeleton but via the whole conceptual (400,000) knowledge asset. Darwin invites to explore the target neighborhood, in the example above pointing to “lyric soprano” implies defining the following sub tree [Soprano => Lyric Soprano => light Lyric Soprano, Full Lyric Soprano] Target Node Presentation: within sub tree pending of “female”: “Lyric Soprano” (yellow on grey background) is head of the following track from “The Art” root: 0.1.2.2.2.2.14.2.2.2.3.3. In the figure we have eliminated the seven nodes prefix. This node has as its ancestor the “Soprano” node, …2.2.2.3 and as “sons”/ “daughters” the nodes “Light Lyric” y “Full Lyric”, …2.2.2.3.3.1 y …2.2.2.3.3.2 respectively, leaves of the Art Tree. Eventually the search could extend to the enlarged family including “uncles”/“aunts”, “Mezzo Soprano” y “Contralto”, …2.2.2.2 y …2.2.2.1 respectively and to “brothers”/“sisters” or “collaterals” “Soubrette” and “Coloratura Soprano”, …2.2.2.3.2 y …2.2.2.3.1 respectively. The Wizard must proceed to show for each node, at users demand, their information content, fundamentally the concepts closely related to the node thematic (yellow core in the figure) in the order: target node, ancestor, and sons. Users are also challenged to mark those concepts that at their only judgment be closely related to the “in mind” image they have in their brain however using their common sense in order to not collapsing the query outcome. Node content: In this demo we have only uploaded the Semantic Art Map Skeleton but not the content of their 7,571 nodes. Only we have uploaded the content of a nodes sample sufficient to appreciate the search mechanics in full. The full HKM, Human Knowledge Map and all the component of this “Knowledge Forest” will have about 200 Human Knowledge Disciplines, 500,000 subject nodes filled (yellow cores in the figures seen up to now) with 15 million of concepts per language and 500,000 “node metadata”, a sort of “semantic fingerprints” of them (the nodes). In the figure below is depicted the basic structure of this metadata: a) The definition of the “node name” as per one or more “Authorities” (URL’s); b) from one to three semantic core clouds, namely: b1) A set of Generic Concepts closely related to the central node theme, b2) A set of Objective Concepts such as names, locations, dates, codes, acronyms, etc., closely related to the central node theme, b3) A set or gallery of images also closely related to the node theme and considered relevant by a special family of Darwin agents.
  • 82.
    Going a littledeep Simply by accessing the node the Wizard knows its semantic track, with information good enough to make an efficient and meaningful query. However it has options to improve the efficiency and richness of its queries. Let’s go back to the Lyric Soprano example: The Art Performing Arts Main Performing Arts Theater Genres Opera Vocal Classification Range Female Soprano Lyric Soprano that corresponds to the code 0.1.2.2.2.2.14.2.2.2.3.3. If the user wants to query the Darwin Wizard code that track in the following string *“the art” “performing arts” theater opera “lyric soprano”+ Querying Google Darwin Wizard checked the sequence from Buenos Aires as of 24th November rendering 37,100 Web pages. Darwin computes for us the following series of uncertainty reduction *“The art”, 103.000.000 + “performing arts”. 11.500.000 + theater, 8.190.000 + opera, 5.140.000 + “lyric soprano”, 37.100+ Before the user attempt to make his/her first click Darwin Wizard will invite him/her to adjust as much as possible the semantic vicinity of “Lyric soprano” for instance: *“the art” “performing arts” theater opera “lyric soprano” “Giselle allen”] => 5 *“the art” “performing arts” theater opera “lyric soprano” "light lyric soprano"] => 7.600 *“the art” “performing arts” theater opera “lyric soprano” "christina haldane"] => 1 *“the art” “performing arts” theater opera “lyric soprano” soubrette] => 15.100 *“the art” “performing arts” theater opera “lyric soprano” "non classical"] => 758 *“the art” “performing arts” theater opera “lyric soprano” "valerie masterson"] => 154 *“the art” “performing arts” theater opera “lyric soprano” "Sarah Brightman"] => 4.160 *“the art” “performing arts” theater opera “lyric soprano” "Joan Baez"] => 2.020 *“the art” “performing arts” theater opera “lyric soprano” "victoria de los angeles"] => 814 Appendixes Nodes Physiology Within a node Darwin hosts its Semantic Metadata, namely: 1. i-URL, intelligent URL, a sort of semantic brief of the node theme pointing to a small set of Authorities, at least one, considered as statistically “modal” that “talk” with authoritativeness about the theme on hand. It also hosts search parameters and markers. 2. Generic Concepts: Darwin extracts them from large “textons”, thematic vectors formed adding significant samples of text corpuses of “all” Web documents that deals with a specific “theme”, for example “Lyric Soprano”, normally in the order of a thousand to hundreds of thousands.
  • 83.
    3. Object concepts:more than “in mind ideas” specific objects, for instance physical and juridical persons, special events, places, etc, that contribute to identify “in mind” ideas properly. Optionally concepts of any node could be “weighted” within their respective subjects as the Search Engines popularity for the query *node path “concept”+. For example the “victoria de los ángeles” weight in the example above would be 814. Art Tree metric (by level) Level 1, 17 (17) Level 2, 18 a 175 (158) Level 3, 176 a 387 (212) Level 4, 388 a 995 (608) Level 5, 996 a 2107 (1112) Level 6, 2108 a 4309 (2202) Level 7, 4310 a 5.960 (1651) Level 8, 5961 a 7023 (1063) Level 9, 7024 a 7437 (414) Level 10, 7438 a 7542 (105) Level 11, 7543 a 7566 (24) Level 12, 7567 a 7570 (4) i-URL Layout: we show below a typical i_URL built for the Computing Map => Back
  • 84.
    Logic of WebSearch Under Darwin Ontology Semantic versus non semantic By Juan Chamero, from Buenos Aires, as of March 13th 2014 Questions and Answers from Plato and Aristotle to the Digital Era Meaning of querying Querying by (q): A search process consists in issuing a series of queries q’s via “meaningful pairs” [s, q] where (q), the only visible component of the pair, is a sort of guess, the best conversion to words that users build to obtain information about certain “in mind” image they have related to subject (s): the component that paradoxically remains invisible! The ideal search, in fact the ideal meaningful pair *s, q+* or “semantic meaningful pair” is the one where (q) is the “modal name” of the “in mind” image (s)! Comment 1: in fact the q of [s, q]* becomes q*, the modal name of (s) and also to avoid confusion we may add that the (s) of [s, q] becomes s*. Comment 2: Take into account that in Darwin Ontology a “modal name” is the right name that univocally determines a given “in mind” image or “concept”. In fact, statistically, the s* inference of (s) when querying by q* matches the user “in mind” idea, being the reverse also true: if and only if querying by q* users may infer s*. Query outcome: Basically the search outcomes have the form of lists of References each one with its “Title” or Header selected by the search engine under its particular criterion, the URL, Uniform Resource Locator or address where the Web Page is located and a pse “search engine piece of information” attached to it under the form of a paragraph extracted out of the content and explicitly related to (q) trying to emulate as much as possible human highlighting: how much retrieved content matches q. So the outcome to the “visible” guess (q) is a list of (n) references ,R-, ,R1, R2, R3, .., Rn} where (n) usually may go from a few thousands to hundredths of millions. These references have somehow
  • 85.
    within their “textcorpus” words and strings of words at any order, isolated or as t-tuples conforming specific meanings, that match partially or totally the sequence of (q) words one or many times: in fact just words and sequence of words here and there that somehow match (q) components partially or totally. Some search engines like Google have the possibility of searching by (q), exact as_it_is written, by embedding the string within quotation marks. Are modal names easy to find? No, they are not. Let’s suppose that a same given subject (s) is in the minds of a millon people that think, write and speak in the same native language. And let’s imagine a sort of championship: all of them are invited to “find” their “in mind” idea with a given search engine, for instance Google, with only one guess, no matter the amount and ordering of words they use to materialize (q). And as we are talking of a championship some “entity” should qualify and “rank” the search engine “outcomes”. For this purpose we may create a special “semantic rank algorithm” that for example take into account: a) the Top 50 References and matches their content versus a unique list of keywords allegedly related and fully covering the tournament “in mind” idea; b) the size of the outcomes; c) the thematic homogeneity of the outcomes; etc. At the end of the computation a winner (q) (or a set of similar q’s from a semantic point of view) will define the “modal name”: this name becoming q* will identify “statistically” and univocally the subject s*. Semantic resonance: As most actual conventional search engines only index by words the search efficiency depends too much of the luck, intuition and why not talent of users to point to the invisible (s) by guessing the right (q). The experience tells us that actual search is a heavy disappointing and uncertain task. We are talking about professional search, people that needs precise information and knowledge about the most diverse subjects that frequently devote hours and even weeks to find something valuable. Why? What happens falls under another phenomenon: the Web “semantic resonance”. We may imagine conventional search engines like radio devices that inspect the Web space thru a dial. This dial is huge: it tunes along a spectrum of nearly 15 millions “stations” per language being each station one “in mind” idea of humans. Tuning proceeds by (q) queries but the Web space is terrifically sensitive from the point of view of semantics: it means that minor and apparently non significant q alterations, in its components or within its grammatical structure: a singular instead of a plural, a comma, a tense of a verb for one of the words of (q) may mean passing from million of references to zero. Let’s see something about Pope Francis and Barack Obama: [q  “A poor church for the poor”+: 321.000 references as per Google exact search; “A poor church for the poors”: 1 reference perhaps because of a possible grammar error; [q  “Hungry for change”+: 10,400,000 references as per Google exact search; “Hungry for a change”: 723,000 references because saying points to a very different concept;
  • 86.
    Comment 3: whenwe talk of 15 millions of concepts  “in mind” ideas in English we are by default covering all “instances” of those concepts, namely examples, particular cases, specific occurrences, etc. By sure the number of total “instances” will be by far larger than 15 millions!. A first step to make the Web semantics will be to structure it by concepts as a first step and by all possible instances as a second step. How does a conventional search proceeds? Given a pair [s, q] we obtain a series of links {R} where from we inspect only a few, generally within the Top of the outcome iceberg, symbolized by {{R}}, for example the second, the fifth and the tenth: {{R}}  {R2, R5, R10}. And from it we may improve our (q) => (q1) => (q2) => …and so on and so forth until we get what we need or at least to gather enough pieces of information and knowledge to satisfy our needs. This process is not as convergent as expected and to make things worse the (s) itself gets blurred, because too much and sometimes misleading information stuns users. Only persistent and well trained users may unveil modal names, the q*’s that may satisfy - in theory- their needs in only one click! Conventional Search Remember that (s) in the pair [s, q] is subjective and invisible. As the Web is only indexed by words users must proceed issuing a series of (m) guesses as follows: [s, q]  a (m) series of 5-tuples {[s(i), q(i)], n(i){Rn(i)}, f(i), puser(i)} IA {puser(i)} SHOULD convey to satisfy users’ needs and SHOULD represent users’ “in mind” needs  most times only partially and “blurred” IA {q(i)} SHOULD Convey to Modal Name of subject (s)  almost never Where: o (m), stands for the number of guesses to find something valuable that matches properly the user “in mind” image (s); o (i), is the number of a specific guess or “Web exploration”: i = (1, 2, 3, …, m); o n(i), number of documents ranging from a few hundreds to hundreds of millions that match q in exploration i; {Rn(i)}: list of references from R(1) to R(n(i)); n(i){Rn(i)} is in fact a single tuple, a “monad”: the outcome list (Rn(i)) of n(i) References; o f, stands for “a few” references selected, generally from 1 to five of the Top 10; o puser, stands for “user pieces of information and knowledge” extracted either by humans or by agents as raw data to synthesize the whole exploration thru (m) steps in a consistent Intelligence Report about the subject (s). Examples of pieces could be links, images, multimedia objects, text corpuses, navigation instances, comments, etc. o IA, stands for “Intelligent Analysis” gathering and classifying the most diverse pieces of information and knowledge. IA {puser(i)}: Even in the case of semantic expertise the search proceeds “spiraling and converging” within the Web from ignorance to truth approaching to s* guided by a track of q’s approaching to q*: (q1, q2, q3, …, qm), along (m) steps,
  • 87.
    probably not pointingto the best references that deal with subject s* but to its neighborhood constituted by sub optimal documents. o IA {q(i)}: This process of intelligence over q’s could be eventually performed by a Darwin algorithm via a “literary and semantic synthesis” of tracking (q1, q2, q3, …, qm) => q*. IA {q(i)} example: Along an exploration of (m) steps we used 10 words as follows: q1: w1 w2 w8 q2: “w3 w4 w5” w6 q3: w1 w2 “w3 w7” q4: “w9 w10” ……………….. qm: w2 w4 “w7 w5” q* synthesis algorithm: A string like “w3 w4 w5” tells to a search engine like Google to look for documents that have within their text corpus the exact expression w3 w4 w5 as_it_is. It is not difficult to create an anthropic algorithm that suggest humans well written queries based on the exploring track components and their related keywords weighted with their respective “popularity”. The winner could be for instance: q*: “w1 w7 w5” that will point to s*. Semantic Search [s, q]*  A unique 5-tuple {[s*, q*], n* {Rn*}, f*, puser} IA of puser: SHOULD convey to satisfy users’ “in mind” needs  always q = q* is the Modal Name of subject (s); by default! n* << most n(i) of conventional search How to obtain q*: The first thing a Semantic Search Engine should do is to help users to identify “the best” q*’s to fit their “in mind” images s*’s. In order to do that we need mapped, following a hierarchic structure, all possible “in mind” images of humans, no matter how much they are, assigning to each of them a “name” in each language being all of them “unique”. Let’s accept for a while that these unique names exist and are perfectly identifiable and meaningful. We may ask ourselves the following question: is it possible to guide users to identify their particular “in mind” images? Yes, it is! The only we need as identifiers are “words”. The “retrieval algorithm” is rather complex but absolutely feasible: giving a “guess” formed by one, two, three or more words suspected as “members” of the name as matched in our brain (for example *w1+, or *w1, w2+, or *w1, w2, w3+) the algorithm may “say us” something like: “Dear user the pair *w3, w1+ of your guess is part of ,s1, s2, …., sh} - a list of (h) subjects (themes, topics) probably pertaining to more than one Branch of Knowledge-. Please tell us what subject/s you would like to inspect. And eventually you are invited to add one or more keywords in order to make your query more specific. If you prefer you are also invited either to change your guess or to navigate at “from A to Z” mode any Branch of Knowledge or part of it!”
  • 88.
    Semantic 5-tuple: thes* of the pair [s*, q*] identified as above, provided it is possible to map the whole Web, enable us to satisfy our information needs in only one click once q* is successfully unveiled! In fact the retrieval is performed by a Darwin SSSE, Super Semantic Search Engine working under Darwin Ontology that has a built in Human Knowledge Map (Web Map) and a “Wizard” that acts as a “super librarian” (in essence an Intelligence Retrieval Algorithm managed by an agent). Queried by q* the search engine brings the monad n* {Rn*} with n* << n(i) (via conventional non semantic search). The user selects a few f* out of the search engine outcome being f* of similar magnitude to any of non conventional search f’s. Lastly user selects a unique set of puser, pieces of information and knowledge, without the “noise” usually carried by the multiple puser(i) sets. IA of puser: The Intelligence Analysis performed on puser proved to be coherent and straightforward because absence of noise and because almost all pieces belong to the same thematic besides! Some Reflections The Thinker of Rodin Reflection 1 How conventional search engines SHOULD perform open queries Any search expert that makes extensive and intensive use of this type of search will face strong doubts about the possibility of structural failures and bugs. Let’s imagine for instance how we should program an algorithm to retrieve pages that match a given (q). In theory search engines should perform heavy computations to match large q’s: let’s see for example what happens for a q of 4 words: q: *w1 w2 w3 w4+; being “open” the search engine robot must consider all types of non repeated valid sequences of those 4 components, for instance *w3 w1+, *w2 w4 w3+, *w2+, and in general monads, dyads, triads,….., n-ads (n=4 for this example). We say “non repeated because sequences such as w2 w2 could be “prima facie” considered nonsensical (or very specific…) meanwhile w’s appearances of components “here and there” separated by spaces, paragraphs, etc.: (….w2………w2…………….w2…) found along a document tells us that w2 matches three times increasing its “density appearance”. The algorithm should take into account that users are trying to guess s* or q* by imaging via words facets of them. The amount of k-ads permutations out of n elements is given by n!/(n-k)!, where (!) stands for “factorial” instead of admiration sign. Applying to this example:
  • 89.
    4-ads (tetrads): 4!/(4-4)!= 4!/0! = 24 3-ads (triads): 4!/(4-3)! = 4!/1! = 24 2-ads (dyads): 4!/(4-2)! = 4!/2! = 12 1-ads (monads): 4!/(4-1)! = 4!/3! = 4 n=4, k=1, 2, 3, 4 Being the permutations as follows: k=4: (24) 1234 1243 1324 1342 1423 1432 2134 2143 2314 2341 2413 2431 3124 3142 3214 3241 3412 3421 4123 4132 4213 4231 4312 4321 k=3: (24) 123 124 132 134 142 143 213 214 231 234 241 243 312 314 321 324 314 341 412 413 421 423 431 432 k=2: (12) 12 13 14 21 23 24 31 32 34 41 42 43 k=1: (4) 1 2 3 4 2341  w2 w3 w4 w1 It means that in order to be fair matching and rank algorithms should take into consideration all possible permutations of all n-ads and of their permutations and compute as many times a given match occurs. Reflection 2 Some actual SE’s detected problems Nobody knows exactly how actual search engines retrieve and “rank” Web pages. We have detected many failures such as not respecting the formal logic, for example the AND/ OR operations. When a user ask for w1 AND w2 it is supposed that asks for those pages that have within their text corpus both words, no matter their order and separation nor the “appearance density” of them. However “many times” some search engines turn users crazy computing explicit or implicit AND operators like pseudo OR’s instead! AND: For example querying by Google (as of September 4th 2013, from Buenos Aires) q: “hungry for a change” (exactly, embedded within “quotation marks”) gives 723,000 references and the query: [q  “hungry for a change” Obama+ instead of obtaining a significant reduction of references because we are disregarding those pages that having “hungry for a change” do not have Obama within their text corpuses, Google bring us 5,860,000 references?! OR: Let’s try now the open queries: *q  barack obama] that SHOULD give the same outcome that the reverse [q  obama barack]; However the numbers are different (as of September 4th 2013, from Buenos Aires): [q  barack obama], 860,000,000 [q  obama barack], 1,040,000,000 Differences around the clock: most search engines show significant outcomes differences along a day and for different regions of the world and for some search operators like for instance Google searching within specific domains via its “site:” operator. Comment 4: These are bugs possible related to an erroneous importance and neighborhood influence that Google rank algorithm assign to high popularity keywords, like Obama in the example.
  • 90.
    Reflection 3 What’s ina query? The answer (a) of a query (q) implies effort, waste of energy. A “person” (oracle, teacher, assistant, helper, shaman, authority,…) provides another waste with information and knowledge under the form of explanations, documents, sources of information and knowledge that need of analysis and synthesis. As we have seen above (s) is behind any (q) being q the only visible part. What is then the nature of any answer (a)? It was described as a 3-tuple: [n{l(i)}, f(i), puser(i)] Meaning volume, sample, and assimilation of the answer. In fact search engines provide users a torrent of unstructured and disordered information; users usually pick a tiny sample of this torrent, and then dig in into the pieces of the sample in a sort of individual ad-hoc “data mining”. What is also “hidden” is the net worth of the query, namely the quantum of information and/or knowledge and/or intelligence finally assimilated by the user. As the query was triggered by (s) we may assume that this final net worth could be described in terms of (s) as well. So the full “learning cycle” will be something of the form: [s, q] => [d, d, s] => [s’] Where: s: the hidden s before the query; q: the actual query; d: a generalized torrent of “documents” d; d: a generalized sample of the torrent of documents d; s: a generalized ad-hoc “data mining” over raw data extracted from sample; s’: the hidden s’ after the query defined as a sort of convolution (s+s); Some sources and semantic axes of this Darwin exploration: Active learning; Active Learning by Querying; Machine learning; e-learning; Junior search engine; University of Twente, Netherland; Picture based querying; Knowledge and Information; Tools and Techniques for Gathering Marketing Data; Examples of Explanations for kids; Another example of Explanations for kids; The Evolution of Mind Mapping; El Súper Libro de Preguntas y Respuestas de Charlie Brown; FUD, Miedo, Incertidumbre y Duda; Munrudico Visual Images, from La Vanguardia; [q  "querying and learning"], 3,250,000; [q  “query to know”+, 191,000; [q  "learning by querying"], 26,800; [q  "query for learning"], 32,900;
  • 91.
    The Darwin 5-tuplefor Kids The query 2-tuple is the milestone of searching; users ask for a pair *s, q+ where s is the “in mind” image they have about what they “need” in terms of information to improve their knowledge. As the figure down at left suggests they are invisible meanwhile q the second elements of the pair point to the hidden idea expressed the best way by users, see the figure right a common way: “by explanations”. There are many ways to express them and figures down show some: a) primitive (the little child gesture suggests more than words. doesn’t he?; b) a more elaborated q via words (a girl telling something to a friend; c) an “average” q expressed via formal audiovisual messages; and d) an audiovisual explanation of something relatively abstract and complex as the Yin-Yang. i) The query 2-tuple: [s, q] Examples of q’s Explanations for kids Munrudico Visual Images Explanations for kids Explanations for kids Primitive: blablabla Average Complex - Abstract The query 3-uple is the outcome of searching: within the triplet [d, d, s] the first term d stands for SERP, Search Engines Results Page, and amount of references (URL’s) that point to documents dealing with query q. As most conventional search engines are not thematic and only index by words these amounts are generally too big and thematically ambiguous. Its second term d stands for a sample of supposedly thematically specific and authoritative documents. Here rests the main weakness of conventional search engines. Finally its third term s stands for an “intelligent and intelligence data gathering” procedure that generally has to be performed by the user: in fact an intelligent collection of pieces of text, multimedia objects, images, metadata, pointers, etc., to work on it as intellectual raw data.
  • 92.
    ii) The query3-tuple: [d, d, s] SERP - Authorities - Info + Intelligence gathering The comics down depicts the universal Q&A learning cycle of learning by questioning. Along successive 5-tuples humans make their “in mind” ideas (s) evolve to become (s’). iii) The Q&A learning cycle [s, q] => [d, d, s] => [s’] s Q&A s’ => Back
  • 93.
    The Art TreeDarwin Demo Presentation to the XXXXX Institute By Juan Chamero, from Barcelona, as of January 2015 Street Art Utopia, by David Walker -Juan Tuazon The Art Map, as a piece of the HKM, Human Knowledge Map, was built for the EU, European Union as a demo of the “semantic” talent of Darwin Methodology to retrieve out of the Web_as-is all the information and intelligence disperse here and there from the past to present. The knowledge creatures you are going to see, semantically structured by Darwin, were “up there” in the Web Ocean uploaded openly and at will by millions of artists and laymen. That map was uploaded to the presentation laptop acting as virtual “semantic glasses” of the Google browser.
  • 94.
    1. A threeupper levels vision Take a look to its seven main clusters. Within each cluster the main art subjects are depicted and hyperlinked like in a geo map. The Art Tree is deployed from root to leaves in up to 13 levels. 2. Mouse over the node “drawing”
  • 95.
    3. A detailof the upper level “drawing” sub tree This is a sub tree of 80 derived nodes. For each node we may have access to a gallery of images (5 in this demo) 4. Knowing a little more via Google images thru Darwin Semantic Glasses
  • 96.
    5. Knowing alittle more via Google images thru Darwin Semantic Glasses 6. We are invited to search “pen and ink” works within “artist tools”
  • 97.
    7. Knowing alittle more querying by the concept “masters of drawing” via Google Web 8. Deepening a step querying by the concept “Leonardo da Vinci” via Google Web
  • 98.
    9. Similar asabove querying by Michelangelo 10. Let’s go now to the “drawing” neighborhood
  • 99.
    11. Let’s seewhat a neighborhood is By “semantic neighborhood” we mean the pertinence or membership to a “semantic family”, in this example to “the drawing family”. So drawing is the second son of “classic arts” and brother of “painting”. It has many “brothers” as sculpture, and architecture and several “sons” or subordinated subjects such as “history”, “artist tools”, “support media”,…. Trees and sub trees may also offer access to arbitrarily agreed forms of extended families including collateral subjects at the level of “uncles” and “aunts” and even closely related and/or friendly and/or akin subjects. Next we are going to explore how data is structured in a sort of database. One of the problems we face when dealing with “Big Data” applications (and this is one!) is how to offer friendly and efficient interfaces to navigate and at the same time provide overall visions and up to the minimum detail as in geo maps. As we will see a HKM, Human Knowledge Map in a given language, must map more than 15 millions “ideas” along more than 600,000 subjects (themes or topics) finally structured as a “knowledge forest” of about 200 disciplines. Take into account that The Art, only a small piece of it notwithstanding “complete”, has 7,571 subjects and about 500,000 “ideas”.
  • 100.
    12. Let’s seea little deep inside The Art Map content could be saved and deployed resembling a DNA vector along a two dimensional matrix, in this case of 23 columns by 329 rows. Concerning the whole HKM, Human Knowledge Map it could be saved and deployed in 23 columns by approximately 30,000 rows (~690,000 subjects, not too much in terms of “Big Data”! Within each “semantic cell” that have specific and unique name is hosted the “semantic fingerprint” of the “subject” pointed by the name, a brief description of it and the “authoritative sources” where from the Darwin agents retrieved the description. 13. Passing the mouse over “painting”
  • 101.
    The deep levelof browsing: Imagine yourself browsing The Art tree by track from root to leaves and going from right to left and in parallel creating the 7,571 cells from upper left corner of the matrix, going right and down row by row unfolding the tree in a rectangular matrix. We are going now to browse the map cell by cell and even within each cell reaching a semantic universe of ~500,000concepts! 14. Let’s inspect the “interior” of a given cell, for instance “lyric soprano”. The search engine tell us that there is a node named “lyric soprano” 15. Let’s go to “lyric soprano” cell and its neighborhood
  • 102.
    16. Mouse overLyric Soprano again …. Doing mouse over the Lyric Soprano cell activates the same search features as in slide 2 and subsequent: sub tree of the node and its neighborhood complemented by a gallery of images. 17. Node content: i-URL’s and Semantic Fingerprints
  • 103.
    We are enteringinto a new dimension of searching: This “feature” is not only a powerful tool to make the search more direct and precise but a tool to find whatever we need in only one click as well. It is equivalent to being in a huge Web Library managed by expert and friendly librarians. We have said that in each node is stored something like the demo of its subject. Let’s suppose that for the subject “masters of drawing” there exist 20,000 Web pages dealing with this subject with a high level of authoritativeness. Darwin Agents under our Darwin Methodology and guided by our Darwin Ontology may unveil from these raw clusters of content a weighted set of dominant concepts (modal concepts) that are considered the semantic synthesis of the cluster subject: masters of drawing in this case. This set is the core of the above mentioned “semantic fingerprint”. You may easily guess that adding one of these specific and unique concepts to your “querying” it will focus precisely on the semantic key you are looking for! We are close to the “find a needle in a haystack” utopia. You will be now invited to see how this feature works. Darwin agents will also generate for each subject its corresponding description expressed as part of its metadata (i-URL). Concepts could be of several types: generic, objective, functional, etc. 17 bis. List of concepts stored in the “Lyric Soprano” node. You are invited to make click, perhaps your first click along this demo: making mouse over will provide you only semantic overview. In order to be specific and going right to the point you must make a click: in the average no more than one!
  • 104.
    18. Examples ofspecific search in only one click
  • 105.
  • 106.
    DARWIN (BRAIN) TEASER THEWEB (micro brain teaser) o The Web needs o Darwin does o A myriad of applications Semantic Web: it does not exist yet!, we are on the way to….., the Semantic Web is coming….; The Web as a two interacting worlds’ paradigm: K versus K’: established order versus fuzzy, unstable and in evolution orders; governments versus governed; who teach versus who learn; sellers versus buyers; Just words: As of today Web documents are only indexed by words not by meanings; Social Explosion: From a world of 50 million Websites versus 200 million people, to a world of 700 million Websites versus 2,900 million people in only two decades;
  • 107.
    The Web oftoday looks like a huge ocean of non structured creatures (Web pages) where only a few are structured ones: in numbers 2% versus 98%; The beginning should be the beginning: to properly know the Web we need to know K first in as much detail as books and essays in a conventional library; K Role: The world power and resources management are in K; K’ Role: Innovation and changes (sustainable) are in K’; Big Data: mostly non structured and public lives in K’; The Human Knowledge as of today has four stages: information, knowledge, intelligence, and wisdom; we are entering into the Knowledge Era;
  • 108.
    THE WEB NEEDS …tobe structured, beginning by K. One way is to structure it “of a sudden” or gradually. Another way would be to “see it” as structured, like Galileo Galilei did with its telescope. Darwin which stands for “Distributed Agents to Retrieve the Web Intelligence” may build a sort of “semantic glasses” to see the Web more and better (as virtually structured). These glasses have two parts: a Map of the Human Knowledge as retrieved from the Web as_is at a given moment and a “Wizard” that dialog with users like a “super librarian”. These maps that may evolve by themselves have all conceivable themes we humans share at a given stage of our civilization. If we imagine the Human Genome like a knowledge database that tell to intelligent beings how we are the Human Knowledge Map tell them how we think via how we document. Back DARWIN DOES o Darwin search guides any user to what he/she needs in Only One Click; o Darwin Wizard dialog concisely with users assisting them to engineer the best question that point directly to what they need in 30 seconds in the average; o Darwin procedures, algorithms and agents are not intrusive. They “behave” only as smart observers. In some extent could be considered angels; o Darwin a predicting tool: Darwin may gather all pieces of information and knowledge dispersed on the Web about any subject and suggests humans how could those pieces be structured; o Darwin unveiling tool: Darwin may inspect any cluster of data suggesting humans the best statistical “metadata” about it, in fact enabling us to “see it” more and better as semantically structured; o Darwin procedures, algorithms and agents may retrieve from the Web as_is information and intelligence disperse at both sides K and K’ building trustable and interdependent and synchronized K Thesaurus and K’ Thesaurus respectively; o Persons either as Juridical Entities or People as proprietors or administrators in K or as single users in K’ leave indelible tracks of their behavior at both sides. Being both sides semantically structured it is possible to make meaningful and trustable causal inferences about their behavior and all type of activities trends; Back A MYRIAD OF APPLICATIONS o Encyclopedias, o Meaningful translations, o Trustable surveys and polls, o IR, Intelligent Reports, o SSSE, OOC, Only One Click, Super Semantic Search Engines, o Knowledge Maps, o Semantic e-Learning, o Knowledge Creation, o Better Truths,
  • 109.
    o Free flowof Information and Intelligence between K and K’, o Seeing more and better hidden demands, o Full equalization of Offer versus Demand scenarios and vice versa, o People Behavior Trends, o e-membranes building as universal interfaces among multiple sources of knowledge, o And much more ………Back SEMANTIC WEB The Semantic Web, for its creator and actual W3C Consortium Director Tim Berners Lee does not exist yet, it is another Web. Web Semantics is a discipline in state of formation dealing with the meaning of things and manifestations of everything that surrounds us, from concrete things and matters to any degree of subtleness encompassing from sensorial to non sensorial world. Following the track of managing and understanding textual and multimedia corpuses we are learning to look for information and knowledge thru images and shortly we are going to enter into the world of tactile, olfactive, tasting, sounds and sixth sense semantics. The figure above shows the cover of Semantic Web, an eBook of Juan Chamero, Principal Architect of the Darwin Methodology . Back. .
  • 110.
    THE WEB ASA DUAL WORLD PARADIGM Internet is a technology and a network that enables human communication in both ways and simultaneously, for example between Websites proprietors and administrators with their users and reciprocally users with them. AS OF TODAY: in the figure is depicted a Dual Web Model K versus K’ as per Darwin Ontology. The black region above corresponds to the Established Order of the actual world as_is in the Web or K Region, mostly not structured conceptually and for it considered as semantically “flat”. The green part corresponds with the K’ Region of users also non structured and relatively chaotic as they (users) are connected as individuals generally without pertinence to any ordered group or pattern. Information and a basic intelligence flow from K to K’ (dense blue arrow) thru an e-membrane which could be or not intelligent (in yellow) enabling a predetermined basic information transfer from K’ to K but not knowledge neither intelligence (light yellow arrow). FUTURE: Down is shown the Web as within a couple of years. K side absolutely structured for example thru a Human Knowledge Map that will enable Semantic Search of information and knowledge in only one click. Side K’ also could be structured via its respective K’ Thesaurus or Web Users Thesaurus, a necessary condition to make meaningful and trustable People’s Behavior Patterns inferences. Being both sides structured it would be then possible the open and free interchange of information and knowledge between both regions thru their respective e-membranes (dense yellow arrows both senses and dense blue arrow from K’ towards K). Back.
  • 111.
    JUST WORDS Conventional searchengines only index by words. Ideally a textual content is seen by their robots as elementary semantic objects, located between “blank spaces” and some other punctuation marks or separators such as (,), (;) and (:), recognized as “words” for any language and well or wrongly written. The interpretation of words or chains of words as “concepts”, “subjects”, “themes” or “topics” is performed by users at their only criterion. In fact users perform their searches via “keywords”, words or chains of words that either at mode exact as it is written or disperse within the text corpuses here and there they guess point to their “in mind” idea. We, humans, document by threading concepts that correspond to our “in mind” ideas but expressed by words in our jargon, state of mind and humor and depending of our culture and level of formation. For this reason our keywords could belong to many cognitive worlds and families within them: take for instance “José Pérez” pointing to a multitude of JP’s, namely a road maintenance worker of Guatemala, a New York company clerk or a NASA nanotechnology researcher. As we, humans, tend to organize our knowledge “tree wise” spreading semantically and hierarchically knowledge subjects along inverse logical trees from roots to leaves keywords creatures like JP may coexist in dozens of arboreal disciplines and within them in thousands of different subjects. Google for the exact term JP renders 2,340,000 references but deliver them as a flat structure, all could be the corpus we are looking for. On the contrary, in a Semantically Structured Web it is possible to discriminate thousands of similar JP’s by different semantic context , let’s say something like from 00000001JP to con00345557JP. Back.
  • 112.
    SOCIAL EXPLOSION The Webis expanding at a very fast pace, in terms of daily interacting people from K’ as well as per the volume of their transactions and the reach and deepness of them in terms of logical layers added within a man - machine model. At this respect many users interact more and better than prestigious Websites. However this explosion is still in a very primitive stage from a semantic point of view: users are learning to query efficiently and to express themselves meaningfully and an informal Q&A system of learning is on the go. Each day more people learn this way perhaps in excess. As a con users’ language is more ambiguous and limited but as compensation they learn and communicate via images and make use and understand via their senses. On the contrary, the K world looks like frozen, watching and extremely aware of what’s happening in K’ Region. In K vision the K’ World is running out of control and looks irrational and rather chaotic. For some thinkers the Web is entering into a sort of new medieval age. Meanwhile K’ side learns to order by itself K side tries to detect in K’ seeds of a consensuated new world order. Back.
  • 113.
    THE WEB OCEAN TheWeb Ocean: The Web behaves like a huge Ocean where creatures of the most diverse species live. K World would be represented by the Ocean itself formed along eons with creatures whose life cycles and forms strongly depend of the Ocean deepness. The K’ World would be represented by we, humans, that go to Ocean to nurture ourselves, to make provisions of renewable and non renewable resources and for transportation. The Ocean creatures need organic carbon to survive and they obtain it from the zooplancton which at its turn survive from fitoplancton. The Web Ocean to “survive” needs of information as a primary fluid defined by Claude Shannon in its Information Theory and that also needs, as the Ocean, a source of energy, at large the one provided by the Sun. Back.
  • 114.
    THE BEGINNING SHOULDBE THE BEGINNING In Spanish we used to say “a truth of Perogrullo” by something so trivial that’s stupid to say it. However the situation seems an aporia, a state of puzzlement, confusion concerning a million dollar question: why is the Web still unstructured?. From its very beginning Tim Berners Lee its creator, presented and imagined it as semantically structured and today after more than 20 years of life it remains unstructured. Our explanation follows: It was born like a K Region tool to be used by quite a few, at large “authorities”. An initial primitive bibliotechnology bureaucracy was weakening along the time for many reasons, namely: a) explosive growth of Websites and Web Portals; b) a speculative human nature prone to disguise users and competitors in poor control contexts; c) excessive complexity of Web semantic protocols and tools. See Internet History (…, 2008) and Darwin Methodology. So what would be easy to reorient from its beginning (till 2008) became something near to impossible: Websites are now too much belonging to all type of domains, languages, countries, and cultural habits. In order to correct this failure we envisage two main strategies: i) to start from zero ground only structuring new Websites with or without programmed conversion of the existent content or; ii) to build a sort of “semantic glasses” such as Darwin Semantic Glasses to “see” the Web as virtually and perfectly structured. Back. K ROLE
  • 115.
    Web authorities areWebsites and Portals that because their “popularity”, traffic, prestige, singular nature of the information and knowledge they provide, number of links entering and going outside them or by an algorithmic combination of all these factors rank high. See “Page Rank” Google algorithm. These characteristics correspond with an actual Established Order Model. In K Region is then represented the actual World Power: governments and their agencies, supranational entities, universities, professional colleges, Intermediate Associations and Organizations, ONG’s, etc. The information and knowledge of this region is what it is but near to become a fossil at any moment. In K is what it is and should be, the World Global Offer about anything. By intrinsic nature K and their entities are too inertial; they may change but slowly, gradually. Back. K’ ROLE
  • 116.
    BIG DATA Big Datais a term in process of formation that makes reference to the creation, detection and administration of big masses of data difficult to handle for the actual state of the art of computing and databases administration. There exist structured Big Data such as the one generated by the Elementary Particles Accelerator of CERN, Switzerland and non structured Big Data like the ones generated within social networks. Associated terms are: Cloud Computing, Grid Computing, Smart Computing, Chaos Theory, Big Science, Social Data Revolution, Inferential Statistics, Inductive Statistics, etc. And some tools: Watson Super Computing, NoSQL databases, Apache HADOOP, MongoDB, etc. Back. HUMAN KNOWLEDGE
  • 117.
    Within our Digitalto Mind Paradigm the Human Knowledge has four progressive steps, namely: Information, Knowledge, Intelligence and Wisdom. Information as a digitalized fluid was created by Claude Shannon under its Information Theory in the early forties of the last century, at the Second World Word aftermath and very few has been advanced along this line of basic scientific discoveries since then. For instance a theory explaining what the knowledge is: a fluid more subtle and at the same time more elaborated that information perhaps?. However we intuit what knowledge is and even dare to classify it. In despite of ignoring what intelligence is we, as humans, started performing some interesting intelligence classification glimpses: understanding what intuition is; what emotional intelligence is; to differentiate how our left and right sides of our brains work and to define intelligence as the art of taking wise decisions when facing complex crossroads. About wisdom we know nothing except to recognize it as one of the essential virtues and associate it with rationality, equanimity and emotional maturity. As a fact we are leaving the Information Era entering into the Knowledge Era. Back. THE DARWIN HYPERCUBE Conventional search engines like Google index by words and are “semantically flat”, they are unable to recognize concepts and not even the thematic of documents inspected. Darwin, on the contrary, may index the whole Web detecting and recognizing concepts and the thematic of any document inspected. The Web
  • 118.
    Thesaurus built byDarwin could be imagined like a huge hypercube of as many “floors” as “disciplines” (about 200). Darwin agents run thru “clusters” or “textons” (about 100,000 documents each) thematically homogeneous retrieving their “fingerprints” building by de factum their “metadata”, a necessary condition to see them as semantically structured. These fingerprints are like cognitive synthesis of the inspected document, a sort of vector of weighted concepts, and resemble the book filing cards of conventional libraries. Concepts are hosted in the nodes of the Darwin Logical Trees. Darwin Agent exploratory task We embedded down a Flash demo that explains how a Darwin agent inspects clusters. It was settled for 5 speeds, from 1 to 5. It would appear initially deactivated. To activate it make click with the right button of your mouse and a menu will display: push button “play” or equivalent. As inspection proceeds running thru Websites Authorities of cluster 11 (once finished cluster 10) an associated script elaborate summaries and statistics. Once inspection of cluster 11 finishes Darwin agent will go to inspect the next cluster 12. All k’s and their derivates k_.... are suspected concepts that Darwin agent detects/unveils within the Web pages’ corpuses. Darwin agent detects and “measures” the Websites authoritativeness within each cluster: each Website inspected could be (or not) Authority and agent jumps from link to link via a sort of markovian algorithm. Agent may return many times to a given Authority as a function of its architecture, the relative weight and importance for the theme inspected. Activate by pushing “play” making mouse over the image. Once the exploration starts it could be repeated pushing back the “volver a reproducir” key. Thanks!. => Back
  • 119.
    Present and futureof Web Searching Are conventional search engines like Google entering into a New Age of Search or on the contrary into a degrading spiral of misleading information and knowledge? By Juan Chamero, Darwin Methodology Architect as of January 1 st 2014 A FUD Vision, by Grist.org Darwin Ontology: It is a Classical Ontology that models how we, humans, document our ideas probabilistically. It differs from Computational Ontologies that model how humans document ideas under strict computational protocols resembling formal logic “forms” specially suited to be used by trained people and also by agents. Darwin Ontology deals with any type of documents written by humans in any language and for any thematic from bad and fuzzy written ones to essays and thesis written by academic and experts under strict protocols as well. The Human Mind: Darwin assumes that we humans are by far more complex than any conceivable agent and that our logic could not be constricted to a reductionist game of formal logic. Our brains have the ability of synthesize in an instant thousand of YES and NOT tonalities about anything instead of only two and we also have the talent to transfer these abilities to agents and algorithms. Computing Ontologies are necessary and extremely useful as complementary of human ontologies and essential for “coding” semantic data and structures to make them computable. The actual Web as_is: It is estimated that at least 99% of the Web content is not well suited to be inspected by Computational Ontologies and it is highly probable that this situation will not change substantially in the near future. So to “see” more and better the Web as_is the only way is to structure it semantically or at least to “see” it as virtually structured via Darwin. “The Web Semantics paraphernalia”: what most Internet experts mean by “semantic” and specifically by “Web Semantics” will astonish intellectuals, academics and professionals not closely related to the Internet technologies. Probably many of them deepening a little on what is considered semantics, semantic search, and Web semantics, would get surprised seeing edge
  • 120.
    technologies within asort of medieval scenario plenty of “philosopher’s stones”, pseudo scientific assertions, taboos and algorithm mysteries all mixed-up. “The Darwin journey”: Personally as Darwin Ontology creator I started 12 years ago studying what a concept is from Plato and Aristotle to Spinoza and to our Digital Era, its differences whether existent with: ideas, how humans write and document them, and how along centuries have learnt to structure them hierarchically as structured knowledge. Complementary I went deep about the above mentioned paraphernalia: the meaning of words, common words, expressions, quotations, sayings and concerning Internet content differences whether existent among: themes, subjects, topics, thesauruses, dictionaries, glossaries, jargons, keywords, metadata, tags, etc. From this long journey I realized that the Web is a well structured semantic universe almost ignored up to now because its real semantic structure remains subtlety hidden! FUD: Are we entering into a scenery of FUD, Fear, Uncertainty and Doubt? Or perhaps into a sort of nonsensical technocratic discourse where dubious acronyms and neologisms usurp the place of universal and eternal concepts? It seems that any Internet innovation, no matter if minor or significant, should be accompanied of new features impossible to describe with conventional concepts. For instance we all were induced to believe from its very beginning in 1991 that the Web was intrinsically semantic. W3C: The Web as of today is unstructured. Tim Berners Lee, the Web creator, was and he is still convinced that the Web is “potentially” semantic and that at large it will tend to be. But the real truth is that it is still “flat”, unstructured from a semantic point of view. The search engines accepted from the very beginning this reality as something immovable. Tim Berners Lee was also the founder and until now Director of the W3C Consortium, the Web leading nonprofit institution to design and develop standards, languages and tools to manage and improve the Web. In some extent they accepted the Web as_is as a rather difficult data reservoir to make it fully semantic because - for them- its conversion to semantics involves the heavy task of rewriting the whole content following strict protocols by language and/or building for each document its corresponding “metadata”. The Web Community: In parallel to this rather hidden and not yet declared criteria the leading actors of the IT&C industry agreed that semantic means “meaning” and as they worked from the very beginning indexing the Web by words they also accepted, by extension from 2001 onwards, that any methodology, script, tool, program, search engine, algorithm, relating words and chains of words between them means “semantics”. They by de facto ignore how the Human Knowledge structures by itself independently of the Web existence, since the beginnings of our civilization. The semantic nature of certain advanced Web applications, like for instance relations between keywords clusters and groups of people were “tagged” as semantic. Google pragmatism: Along this pragmatic track Google added features at a fast pace defining what semantics is and stating that they provide semantic search and that adding some advanced apps like “The Google Knowledge Graph” they become by de facto a SSSE, Super Semantic Search
  • 121.
    Engine and perhapsnow adding interrelations between users and some “semantic mass media networks” of the Web a new level of service could be attained as for example a SSSSE, Sensorial Super Semantic Search Engine, and so on and so forth. The New Google: Google under “Hummingbird” and “Knowledge Graph” undertakings will operate with enhanced glamour but semantically at the same level as before. As an analogy it’s like a sort of Human Needs Care World System to attend Human Needs via an OTC, Over The Counter - Q&A dialogue system. Its Knowledge Database is a huge store of “pointers” (more than 35,000,000,000) to isolated pieces of information and knowledge, and within the analogy to prescriptions, treatments, diagnosis, cases, “medicines”, stored here and there but it still ignores everything about health care itself. Notwithstanding it has a valuable asset: it knows and it will know better and in extreme detail the demand in terms of users’ needs. For any OTC user demand Google may provide isolated pieces of information that at random and only eventually may have sense for the user: too much effort to provide at large FUD instead certainty. What’s missing: At large Google will know a little more than before its semantic limitations, in this analogy it needs to know the “other half” of the Human Needs Care system: The Human Needs Offer, semantically in as much detail as to fully cover any conceivable demand. This job has to be performed from “zero ground” by making the whole Web evolve to semantics. The Human Needs Offer has a name: The Human Knowledge Map. Are conventional search engines entering into a New Age of Search or on the contrary into a degrading spiral of misleading information and knowledge? That’s one of the big questions of this Digital Era perhaps rooted deep within the following conundrum: Digital to Mind or Mind to Digital? By Mind to Digital we usually mean going to Digital under our control as a way to improve our mind and our lives meanwhile by Digital to Mind we usually mean going towards a superior quasi robotic mind following technology innovations. Digital to Mind approaches are characterized by a) contempt, ignorance and disregard of the past and of the validity, weight and reason of the human evolution; b) high and talented creativity; c) sometimes dangerous and blind forms of reductionism. The Web content: The Web content prima facie looks like semantically chaotic: 35,000 million documents dealing with more than 500,000 themes and about 15 million of concepts per language. Its logic is highly fuzzy almost impossible to unveil by robots unless processed probabilistically as Darwin does. The Web as a dual K versus K’ model: Let’s imagine the Web space as dual, Websites by one side and users at the other side continuously interacting between them as an Oriental Yin-Yang. In the Darwin Ontology are known as the K - side of the “Established Order side” and the K’- side or “The People side”. You as a human could also behave dual as a user in K’ side and as a ”owner” if you are administrator or owner of a Website as an Established Order entity of K side, however never at the same time and with different behavior.
  • 122.
    K versus K’in numbers: When you use a search engine to query the Web as a user you are looking for something you need. These needs are expressed by messages as of today mostly expressed via “words” but take into consideration that are “K’ side messages”!, messages of people demanding some type of help. Let’s approach to the problem in numbers (Web facts): o In terms of traffic: the K side as of today is a huge ocean of nearly 35,000 millions of documents hosted in 350 millions domains and expressed by 650 millions Websites. The K’ side has 2,400 million users that query the K side 1,500 billion times a year (2012). o In terms of “power”: 100 million privileged people at K side versus 2,400 million users at K’ side interacting thru 1,500 billion queries a year. HKM, Human Knowledge Map Feasibility: Darwin satisfies the five IF’s, namely: o IF: In K side there exists everything we humans need in terms of information and knowledge; o IF: This asset is structured and classified by all possible “themes” of the Human Knowledge (not the Human Needs!); o IF: In K’ side there exists “virtually” (for example as a map somehow stored and made available in the Web space) and it is also structured and classified by all possible “subjects” of the Human Needs; o IF: it is supposed that Human Knowledge accommodates somehow specifically and/or univocally to Human Needs; o IF: we provide users an aid to express their needs properly and accordingly we also provide them all pieces of information and knowledge in order to satisfy their needs; We may say that we have solved the generalized Q&A, Questions and Answers problem: Satisfying Human Needs in Only One Click. => Back
  • 123.
    DM - Megaalgorithm A Darwin Classified By Juan Chamero, as of Feb 2015 Purpose: To map a single Major Discipline of the HK, Human Knowledge Objects CeptsDB: cepts Database; it contains all suspected Darwin cepts (plus the suspected minority of main subjects, all mixed up); It contains semantic noise, redundancies, and many types of ambiguity; its volume strongly depends of the discipline to unveil, for example about 900,000 suspected cepts for “The Art”; At the end of the process half of this volume will be eliminated tagged as “probably wrong”; records will have the form: [c, u] where c is the suspected “cept name” and “u” the URL of the under inspection Web page where the suspected cept is present; the ds, “discipline sample”, is the discipline amount of documents (Web pages) that deal with the discipline to unveil as per the Web as_is as of today! Example: “The Art”: exact search in G: 125,000,000 documents (as of 23rd Jan 2014) then ds: 125,000,000 Web pages Note 01: Humans using DM under Darwin Ontology “know” how to extract suspected cepts from ds documents. At the example about 900,000 suspected cepts will be extracted from 125,000,000 Web pages. Note 02: we may imagine the semantic subspace (c, u) like a huge Boolean Matrix of 900,000 rows by 125,000,000 columns only for this example! Note 03: We may apply some Big Data procedures and tools to “reduce” this subspace, for instance just counting existent content by row and by column. Counting by row enable us to have a measure of the suspected incidence of cepts and counting by column a measure about the suspected incidence of a weighted combination of “authoritativeness”, “specificity” and “representativeness” of each Web page of the sample. This ds represents the “zero ground” semantic mapping of the discipline and extended to the whole Web a first raw “zero ground” Human Knowledge Map, where in a given language all the suspected human “ideas” are represented: a map that hold all ideas in a unique cluster without being them discriminated by any type of hierarchy (flat). The next step would be then to unveil the hidden “semantic skeleton” for each discipline, a sort of arboreal structure of its “main subjects” from hundreds to thousands. Ideally tending to inverted logical “trees” of branches and nodes that go downwards from root to leaves and having meaningful “modal names” names. NodesNamesDB, Nodes Names Database: our next Big Data task will be “nodes names” unveiling. Our CeptsDB - supposedly- has all suspected names of all “in mind” ideas humans have for a single discipline and for a given language. From previous research we estimate that in the average 1 out of 40 individuals of the database is a “main subject name” besides a concept. As an example for a CeptsDB of 400,000 names, 10,000 may correspond to suspected main subjects. Our DM, Darwin
  • 124.
    Methodology handle thisproblem via a specialized algorithm created and settled for each discipline under a derived ontology (Darwin Ontology) that tells us how we human use and discriminate “main subjects”. Main subjects are concepts but not all concepts become main subjects. WWD, Well Written Documents are - or tend to be - monothematic and accordingly humans intend to tag them semantically via “their titles” and sub titles either explicitly or implicitly and also via metadata whether existent. Complementary and statistically main subjects tend to have as much popularity as all their derived subjects and as each derived subject is associated to a particular and exclusive set of concepts we may suppose that main subjects have in the average the highest popularities within the ds, discipline sample!. The structure of this database is constituted by pairs [n, p] where p stands for “popularity”. Pairs of p lower than a pre established threshold <p> are excluded. DM algorithms suggest humans the suspected nodes names sub space let’s say a list of 10,000 names sorted alphabetically and by p. Another discrimination tune up performs the same task over weighted sub spaces, for example considering only “authoritative” documents according to certain criteria: traffic, hub power, thematic spectrum bandwidth, etc. dsA, Authoritativeness of the ds, discipline sample: this is a vector of pairs [u, a] where (a) is a numerical value associated to the authoritativeness of a document located at (u) address. In fact it is a subspace of all u’s related to the discipline under inspection. Then if: From [c, u] we infer the ds (u) space U; From any [u, a] built for a authoritativeness threshold <a> we infer de ds (u) subspace U<a>; This task could be performed with no restrictions at all, with U covering all URL’s of the sample up to a hypothetical U<a> with only one “valid” URL as a hypothetical Webopedia. Semantic Seed, Semantic Skeleton buildup: for each discipline humans create “semantic seeds”, something like their upper thematic level that open from their roots. Talking of “The Art” it would be something like the seven big branches derived from it, namely: visual arts, performing arts, literature, art history, art infrastructure, culinary art and combat arts. By the way this discipline has 7,570 nodes and about 400,000 concepts distributed along a tree of 13 levels. In a near future this initial task could be transferred to an agent. DM strategy is to make this seed grow. Up to now as of February 2014 DM proceeds along a four level growing deployment process resembling an explosion of the type 1:10:100:1,000:10,000: from root pass to the seed (1:10), from 10 pass to 100 (upper level), from 100 pass to 1,000 (medium level) and finally from 1,000 to 10,000 (the tree basement, all leaves). 1:10: the human part of the DM “anthropic algorithms”; a human expert or a “Committee of Experts” creates what state as the summit of the discipline. This assumption does not mean that DM accepts it as a true, not even as the best true, only as a strong supposition to be checked as much as possible.
  • 125.
    Remember that wehave as our “best truth repository” our CeptsDB a really huge data structure, virtually a Boolean Matrix in the order of 1,000,000 rows by 100,000,000 columns for each discipline of the HKM, Human Knowledge map. Much of it could be considered redundant, wasteful and noisy arriving at the end of DM processes to something like 400,000 by 60,000,000 but we are enabled to significantly reduce the amount of columns playing with “authoritativeness” arriving perhaps to more dense, homogeneous and coherent cognitive matrices in the order of 100,000 rows (concepts) by 10,000,000 (relative significant Web pages). Human experts do not have “ex-antes” access to DM agents work. Agents check the seed against as many layers of space U they have at hand and against the whole Web as if CeptsDB and its derived databases did not existed. From this check our DM suggests humans experts a set of possible summit composition ranked with a proprietary algorithm. 10:100: this is a crucial crossroad, the first decision Darwin agents must face by “themselves” following the creation criteria and setting of DM proprietors. For each node of the summit they have to hang down a set of possible derived main subjects and without any human guide! What about if we repeat at a lower scale the “zero ground” CeptsDB creation but now not restricted to the discipline as a whole but to the specific main subject of the father/mother node? Take into account that we could have guessed something similar to the human experts’ summit as a reasonable “semantic seed”. Of course as long as we go deep the tree unveiling we lose the influence of the human intervention but we gain in AI, Artificial Intelligence coherence somehow enabling non human objectivity. Instead of checking what human experts guessed at 1:10 now at 10:100 Darwin agents suggest humans the cognitive upper level of the discipline under study. From our experience (in three prototypes and ten semantic seeds) at this step may prompt a possible failure of the semantic seed, for instance human experts most of them formed along decades within an authoritative knowledge atmosphere and consequently with strong and most times hidden prejudices at the same time. To this respect the Web_as_is is terrifically dynamic: disciplines and sub disciplines become obsolete, new ones appear and change their meaning and even semantic pertinence, for example nanotechnology hosted under and within biology, physics, engineering, fashion and culinary art within art, etc. 100:1,000 and 1,000:10,000: are two rapidly convergent steps applying the same criteria as in the previous steps. In order to test the coherence of the whole our DM check the skeleton versus the Darwin Ontology Conjectures. To explain the checking procedure we need to introduce here the Skeleton Database. SkeletonDB, Skeleton Database: along a procedure as described above we structure it as 5-ad “quintuple” [n, {c}, {a}, {u}, h] where: n: name of the suspected main subject; {c}: cepts set associated to main subject name n; {a}: authorities pairs (URL, rank) for main subject name n;
  • 126.
    {u}: authorities selectedto edit the Semantic Fingerprint of main subject name, associated to a brief description of the main subject name; h: hierarchically code of the node as a tree unit; The filling of this skeleton begins by building {c} for each (n) as the logical sum of all column vectors *c, u+ associated to name n. Let’s explain this step: be the main subject “Lyric soprano” for art. In our CeptsDB the row “lyric soprano” is mentioned in 23 Websites u1, u2, u3,…., u23: the reduce Darwin algorithm then proceeds to “add logically” all the cepts existent in those 23 Websites. Note 04: this step is not trivial: Boolean matrices we are talking about are not the same. The initial and big one [c, u] is the mother of many others used by our DM. For The Art the big one had about 900,000 by 125,000,000. This matrix shrinks to about 400,000 by 50,000,000 as explained above. A derived one necessary to perform the SkeletonDB filling is rather different: for each subject name (7,251 for The Art mapping) we have to select a sample of URL’s, let’s say 10,000 in the average, and for each URL we have to retrieve its corresponding {c} set by using the big one Boolean matrix. => Back
  • 127.
    Aiware Methodology Juan Chamero,jach_spain@yahoo.es, As of April 24 th 2013 Fuente: Tree of Knowledge, as per John F. Kennedy for Performing Arts Introduction Aiware Methodology, we are going to define briefly here, is based on and directly derived from Darwin Methodology that imagine the knowledge hosted in the Web as structured under the form of trees, arborescences and arbustive structures in formation. Trees and arborescences are imagined inverted going from their roots to their leaves deepening within our minds like symmetric avatars of the real world. Then the HK, Human Knowledge as a whole could be seen like a forest of trees, arborescences and arbustive structures in formation. Aiware Methodology explores the Web either to detect and retrieve existent “pieces of K” or to create new ones thru a four steps methodology named ikAK: ikAK, aiqueieika, {i, k, A, K} where: i: stands for ideas in mind; k: stands for keywords; A: stands for Authorities (semantic); K: stands for Knowledge;
  • 128.
    As a methodologyit also means: [intelligence to know Amap and Asap about Knowledge] The result is the end of the acronym: eventually the whole K or a piece of it, for instance a HKM, Human Knowledge Map or an IR, Intelligence Report, at large a piece of K. i, the first step: In a human to human relation, Aiware representatives versus prospective customers, Aiware representatives have to “infer” from their prospective customers their “in mind” ideas about what to get as a final outcome of Aiware’s services. These ideas must be precisely described. k, the second step: Aiware experts must create a primal set of “keywords” to detect and retrieve amap and asap A’s (Authorities) and (K) accordingly. This set is the semantic arsenal to unveil amap and asap A’s and K: the more k’s become concepts the more their unveiling potentiality. A, the third step: within Darwin Ontology the entities that certify semantic validity are A’s. As Aiware works under this ontology Darwin agents and algorithms under human control explore the whole Web searching for A’s that “fit” better Aiware’s customers in mind ideas. K, the fourth step: where K stands for human knowledge. Darwin agents and algorithms working under human supervision detect, collect and classify more than necessary raw information and intelligence, in order to build meaningful pieces of the required knowledge. How the ikAK Procedure works A prospective customer, either personally or as a representative of a group, has a need under the form of an “in mind” idea. The idea is finally unveiled, discussed and it will head the Aiware proposal. As the work is going to be performed via the Web all the necessary information and intelligence to satisfy the prospective customer need must be detected and retrieved from it, a huge data Ocean actually holding more than 30.000 million document (Web Pages). Source: “Underground Art”, London Metro
  • 129.
    Web connectedness: Thefigure above depicts that everything leads to everything: the Web is a space where everything is connected and where one thing leads to another thing so no matter where a journey searching for something starts: at large we (or our agents) may arrive to any given target going from hyperlink to hyperlink. A Little more about ikAK steps: i step k step A step K step IHMC Nanomecánica Cartoonstock.com Fathaur Tree (i, k) interaction: Initially k step could iterate recursively versus i step as many times as necessary until the semantic quality of k attain a reasonable level of meaningfulness resembling more and more concepts instead of simple keywords. This task could be totally performed by agents or by humans aided by agents. (i, k, A) interaction: Next Aiware proceeds to step A, identifying Authorities. This is a core Darwin step performed by special algorithms and agents based on scouting the Web thru a sort of “random walks” along preselected authoritative Websites pointed by the former reservoir of k’s arsenal. These random walks are controlled by a “Markovian memory factor” that emulates real time human memory activation thru explorations. This scouting enlarge the A sample meaningfully and an exponential process of auto learning starts: agents found more and more potential A’s that notwithstanding should be checked. Along a three level iteration process (i, k, A) Darwin arrives via special semantic algorithms to a two dimensional semantic matrix of A’s versus k’s where many of A’s are within specific common root domains and many of k’s may potentially belong, are subordinated to or derived from common subjects, many of them could be look like embedded within others. (k, A, C) interaction and draft: As a third dimension appears a piece of content C for each pair (A, k): Darwin agents emulate human content capture for each (A, k) option. Finally a raw document of Information and Intelligence discriminated by the triad (k, A, C) is presented as a Final Report Draft to human consideration. Being you a journalist this triad would provide you as an editor all what you need to build a meaningful Final Report, namely an Intelligent Report, a Survey Report, a Main Behavior Trends Report etc. Talking about some figures: A’s, k’s and C’s could be in the range of thousands but A’s and k’s ranked by their relative importance (A’s could be grouped by domain root). => Back
  • 130.
    BIG DATA SemanticsPrimer Tesauro Básico de Big Data Juan Chamero, as of January 1st 2014 Source: DARPA Topological Data Analysis, from Big Data, Wikipedia Presentation This section could be titled “Big Data in a hurry or “Big Data a las apuradas” in Spanish. It should be considered a sort of accelerated e-learning experience. Non English speaking people may use the translations facilities of the site - still basic but good enough to understand what the matter is-. Let me to present myself: I’m Juan Chamero, technically an AI, Artificial Intelligence and High Complexity Systems expert and a Zen master. Concerning this duality this section will have more of Zen that of science, both human culture pearls, Zen from the “Far East” and science from our “Western Culture”. We are going to present technically a raw and brief Big Data Thesaurus. Pieces of information and knowledge will be prompted without too much ado and explanations like “semantic pills” notwithstanding apt to be - at large- understood and used provided we devote ten times more of space and time. As voluntary Zen “practitioners” try to open your minds, not to oppose anything, let it flow everything comprehensible or not. It is fundamental to be fully aware trusting about the syncretism and cognitive threading power of our brain. To be fully aware, the let it flow spirit is Zen; imagery is both science and Zen and the continuous generation of hypothesis is science. When I was too much younger I had the opportunity to participate in a singular experience. At those times IBM was recruiting and forming their first promotions of Systems Engineers from people coming from all over the world with scarce to null knowledge of English and from the most ample spectrum of formation. We all were immersed in an intensive one year course - in English- about logic, mathematics, physics, operations research, economy, business administration, and complemented with seminars about epistemology, philosophy, sociology, and politics. Nobody was death in the attempt! Warning: by practical reasons all semantic pills will be written in English.
  • 131.
    Presentación Esta sección podríallamarse “Big Data in a hurry” en inglés o “Big Data a las apuradas” en español. Es una experiencia piloto en “spanglish” de aprendizaje acelerado. Siempre están a disposición de los usuarios la facilidad de traducción automática - muy rudimentaria aún pero suficiente para entender de qué se trata - que ofrece el sitio. Permítan presentarme: soy Juan Chamero en lo científico técnico experto en Inteligencia Artificial y en Sistemas de Alta complejidad y en lo humanístico maestro Zen. Al respecto ésta sección tiene más aporte de Zen que de ciencia, ambas perlas de la cultura humana, el Zen de la cultura denominada del “Lejano Oriente” y la segunda de lo que hoy conocemos como “Mundo Occidental”. Técnicamente vamos a presentar un muy breve Tesauro sobre Big Data. Fragmentos de información y de conocimiento serán presentados sin demasiada explicación y aptos para ser comprendidos y usados si fueran presentados en diez veces más de espacio y tiempo. Como practicantes Zen voluntarios traten de abrir sus mentes, no rechazar absolutamente nada, intentar intuir o hasta adivinar explicaciones, aplicaciones y usos de lo que se va mostrando. Lo fundamental es estar muy atento y confiar en que a medida que se vayan viendo estos fragmentos vamos hilvanando en nuestro cerebro conocimiento sobre Big Data. La disciplina, el estar atento, el no rechazo es Zen, la imaginería es tanto Zen como ciencia y la generación continua de hipótesis es ciencia. En mi juventud tuve la oportunidad de participar en una experiencia similar en IBM, que a esa sazón, han pasado muchas décadas, estaba formando a sus expertos en la naciente disciplina de Ingeniería de Sistemas. Durante un año, con desconocimiento o conocimiento muy escaso del inglés, participantes de distintos países y con formaciones profesionales, técnicas y científicas del más diverso tipo, fuimos inmersos en cursos intensivos en inglés sobre lógica, matemática, física, investigación operativa, economía, sociología, administración de negocios, complementados con seminarios sobre filosofía, epistemología, sociología y política. ¡Ninguno falleció en el intento! Aviso: Por razones prácticas los fragmentos o “pastillas semánticas” de la experiencia cognitiva van a ser en idioma inglés.
  • 132.
    Semantic Pill 1 Tips:we are going to start this series with a sort of “e-potpourry”. Please be patient!. The list of terms down each pill are a piece of our Basic Big Data Thesaurus alphabetically classified from numerals, and from A to Z. Each pill will deal with a small bunch of them so selecting one term at a time to generate the pill would be -probably- like picking thematically at random. Take into account that I’m acquiring experience: my first impression when facing this first bunch was literally written as follows: SEO activity; what’s an avatar; learning to ACT?; what’s DP?; what’s the meaning of Web Services; 3V as a metric, limited to 3?; what’s has to do Amazon with Big Data?; what’s the relation between Web Services and Big Data?. My own experience told me those questions: of course I know what DP is but I never imagined in ICT something like “learning to ACT”, deepening in avatar meanings and what to do massive retail with Big Data o 3Vs model, http://www.ascilite.org.au/conferences/singapore07/procs/atkinson.pdf, virtuality, veracity, values; o In the figure B would be a real person and A depicted as a probable polar extreme avatar; o A/B testing, a SEO tool; o ACT in Real Time, a DP approach to Decisions Theory, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9423, Learning to ACT via DP; o Amazon Web Services, see AWS, http://aws.amazon.com/big-data/; o Amazon, see AWS, Amazon Web Services; o Appropriate technology; o Asynchronous DP; o Asynchronous Dynamic Programming; o AWS;
  • 133.
    Semantic Pill 2 Tips:could you identify all Big Data Landscape actors?; Do you know the acronyms built in as for example DaaS, Data as a Service?; do you know the difference in between “Apps” and “Applications?; could you discriminate in between structured versus non structured data?; what about differences in between ordinary versus emotional intelligence; could you imagine some examples; take a look to Business Intelligence, as of Wikipedia; think a little about the type of data that BI applications handle: prevails unstructured? or perhaps structured? o perhaps depending of the case. o BD, see Big Data; o Better decision making, within the context of Emotional Intelligence; o BI, see business Intelligence;
  • 134.
    Semantic Pill 3 Tips:read carefully the Obama Government Initiative; is Big Data a tool for governance; take a historic journey to McNamara initiatives enhancing strategy over tactics; take a look to The Prince of Machiavelli; See the article below about the six principles of building a solid Big Data: you will realize that the article only talk about one of the tools actually used to manage Big Data (Hadoop): however do not despise these principles because they have a lot of wisdom (see next tip); read the Information Week report below: Establishing big data governance policies is critically important because the amount of data involved not only threatens to overwhelm IT organizations, but how that data gets used will actually determine the success of such projects. Specifically, organizations need to understand what type of data is valuable enough to keep, as opposed to data that is expendable. Unfortunately, a recent survey of 3,500 IT leaders conducted by TEKsystems, a provider of IT services, found that 66% of IT leaders and 53% of IT professionals said their data is stored in disparate systems and that they need new platforms to accommodate these increased data management needs. o Big data governance, http://www.informationweek.com/software/information- management/big-data-governance-moves-up-the-it-agenda/d/d-id/1111875?, 81% IT leaders say their companies do not know how to cope with this item; o big data integration; o Big Data Research and Development Initiative, Obama Government Initiative; o Big Data Six Principles, http://www.metascale.com/resources/blogs/151-6-principles-for- building-your-big-data-talent#.UstBTtJDsZk o Big Data, see BD;
  • 135.
    Semantic Pill 4 Tips:physical and mental socialization of work, creation, and innovation; we invite you to see the article below about the blackboard metaphor a technique used to face cooperatively complex problems: some of its uses are: Breaking complex cryptographic codes, computer vision, Speech recognition, Command and Control Systems, Surveillance Systems, Workflow Processing, Case- based reasoning, Symbolic learning and Data Fusion; we recommend you to invest as time as you can to study these creatures: the cellular automaton: you are going to learn a bit more and better what AI and Big Data are and entering into the world of one of the visionaries of the last century and for many the father of computing as we know today: Von Neumann; This pill is going to take longer that previous ones; we recommend to you to explore the DamFoundation.org Website to see a real and perhaps the most outstanding Big Data Application in the world; If you want to know a little about computational logic and math related to big data take a look to the complexity of a data collection algorithm that at large discriminate data in big clusters (“clustering”); Finally this pill ends with “cloud computing” a concept in formation not yet well defined and closely related to a concept we have seen: “Web Services” (see “DaaS” in Pill 2); The Evolution of Big Data at CERN, by DamFoundation.org o Blackboard metaphor, http://c2.com/cgi/wiki?BlackboardMetaphor, a method of working when dealing with problems of high complexity among many people; o Brand monitoring, mainly via Social Media; o Business intelligence, see BI; o Cellular automaton, see related to how to behave semantically Web creatures, http://en.wikipedia.org/wiki/Cellular_automaton o Cloud computing, http://en.wikipedia.org/wiki/Cloud_computing; o CERN data stream, http://cds.cern.ch/record/1430825, data collection algorithm, it’s a sort of Darwin cave men, trying to discriminate homogeneous clusters, o The Evolution of Big Data at CERN, http://damfoundation.org/category/big-data-2/;
  • 136.
    Semantic Pill 5 Tips:community is one of the leading global term of today that deserves 1,400,000,000 references in Google in the order of magnitude of “city” (1,600,000,000 instead); community will derive in Big Data applications very soon ( take a look to Types of Communities and you may imagine our world as a coexistent and superposed giant creature of types of communities). You must see the Human Connectome Project Website, where groups of unrelated and related people brains are scanned in real time in streams of several terabytes; “Connectomics” could be a new brand of neural research within the scope of Big Data: everything is big and extremely complex; we again find the concept of “computing clusters” huge data sets of suspected homogeneous data, both structured and non structured; a very recent possible area of Big Data are Corporate Portals that behave as virtual Web Corporations servicing at large communities of physical and juridical persons (owners, customers, employees, providers, directors, third parties, competitors, associates, partners) like a living creature. Crime combat and prevention, together with other semantic “modal” associations such as “crime and delinquency”, “organized crime”, and “transnational organized crime” should be carefully registered; mapping everything and visualization are closely related to Big Data and there is a strong correlation in between them and the vital needs of we humans in order to survive and evolve: good and evil are always moving but perhaps, in the average the evil moves more and faster than good: for instance crime forms (criminals and diseases) move fast when stopped changing “modus operandi”, regions of activity, strategies, actors, etc; crowdsourcing is another Big Problem that also involves Big Data: people that live well should be aware 24x7 via almost enforced global visualizations of people that live bad next door and elbow to elbow with us. Gallery example of the Human Connectome Project o Community engagement, http://en.wikipedia.org/wiki/Community_engagement; o Computer cluster, o Connectomics, related to neural science, and to “Human Connectome Project”, we include this as the limit of BD possibilities; o Corporate portals, http://www.corporateportals.eu/what_is_a_corporate_portal.htm; o Crimen combat, o Crimen prevention, o Crowdsourcing, see
  • 137.
    Semantic Pill 6 Tips:curation, more specifically “data curation” is a concept not too much used until now in the IT arena; seen data as always changing tracks of human life, - the river never is the same- , it is a crucial asset; without data no history and without history no meaning; this point brings back old questions: has sense to keep and save all data?; is it equivalent a synthesis of data to its raw data?; You should know something about DARPA, the Internet creator and first proprietary, at least to know something about its history since 1958 and Darpa and the Internet Revolution (PDF); by the way DARPA always managed Big Data (we have to be acquainted by reading these pills that Big Data is a relative concept within the “state of the art of the technology”. DARPA again introduces a necessary concept to understand Big Data: Data Topology (see our Home); you should read something such as Data Topology for Dummies: we found one for beginners because this subject requires too much math abstraction and imagination. See below a paragraph of the article: Supposewehaveconducted1000experimentswithasetof100variousmeasurementsineach.Theneachexperimentisastringof 100numbersorsimplyavectorofdimension100.Theresultisacollectionofdisconnected1000points(akapointcloud)inthe100- dimensionalEuclidean space. It is impossible to visualize this data as any representation that one can see is limited to dimension 3….. Source: Data Curation Life Cycle, as of UTexas o Curation data, see data curation, http://www.lib.utexas.edu/services/digital/dpoc/dpoc_data_lifecycle_management.html, Curation Life Cycle; o Curation storage; o Curation tools; o DARPA, http://www.darpa.mil/, the Defense Advanced Research Projects Agency, from USA, it was the Internet creator and donor; o Darpa Topological Data Analysis, topology of existent BD (2009), http://www.carlisle.army.mil/DIME/documents/StratPlan091.pdf, We have to carefully read this milestone!;
  • 138.
    Semantic Pill 7 Tips:“textons” is a term coined by our Darwin Methodology making reference to huge files of homogeneous supposed content, for instance sets of 100,000 Web pages HTML sources supposedly dealing with the same subject chained as a single txt file; DAS is a technology to directly (D) connect a storage (S) to computers via buses: It has to be studied together with SAN and NAS, technologies that connect storages (S) to computer via networks (N); dashboards must be semantically studied within their “content of use” context, for instance BI, Business Intelligence; let’s deep a little about data curation: "Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time. University of Illinois' Graduate School of Library and Information Science. Finally this tip is closed with Data Defined Storage, a sort of Data Centric architecture focusing in semantics minimizing aspects of media, type, time and place of data generation. The thesis is that if you have all your data well structured (data and metadata) you may find an appropriate and optimal solution to your data management problems. Darwin Semantic Skeleton highlighting a particular semantic neighborhood spectrum o Darwin textons, extra large text files processed by Darwin Methodology; o DAS, see Direct Attached Storage, http://en.wikipedia.org/wiki/Direct-attached_storage; o Dashboard enterprise, a sort of soft package associated to a server; o Dashboard, simple BI, Business Intelligence monitoring unit, http://www.maia- intelligence.com/pdf/Confluent-SCIT-Sep-2010.pdf; o Data curation, techniques and tools for useful BD validity and long lasting preservation, http://en.wikipedia.org/wiki/Data_curation; o Data defined storage, http://en.wikipedia.org/wiki/Data_Defined_Storage, an approach to something like “Semantic Database”; o DDS, see Data Defined Storage;
  • 139.
    Semantic Pill 8 Tips:Development Informatics is a new field (2009) of Informatics applied to social problems and equivalent to ICT4D, Information Technologies four (4) Development (D). The subtle difference are in the approaches related to the interpretation of “Informatics” versus “ICT”, which stands for Information and Communications Technologies: the first acceptation focuses on European Cosmovision meanwhile the second on American Cosmovision; Distributed Parallel Architecture is something you need to know to manage large data sets: you may find here a research paper (2012) about it where you may compare three Big Data models-tools, namely MapReduce (Google), Hadoop (Apache) and HBase (Apache); DP, Dynamic Programming deserves a special consideration as an old and always present intelligent approach to “Operations Research” and particularly to “Decision Problems”: its common sense approach is as follows: In order to solve a complex problem of overlapped sub problems we need to know and solve parts of it and then try to solve the whole puzzle by combining solutions. From this reasoning depart at least two strategies: the “brute force” or naïve solving any sub problem each time is needed and a more intelligent solving each sub problem only once. This second approach is fundamental when the amount of overlapping sub problems grow exponentially as a function of the data set volume for example in genomics. See some interesting examples for beginners here. Graph that illustrates “Finding the shortest path in a graph” using optimal substructure; a straight line indicates a single edge; a wavy line indicates a shortest path between the two vertices it connects (other nodes on these paths are not shown); the bold line is the overall shortest path from start to goal. From Dynamic Programming, Wikipedia o Development informatics, o Direct Attached Storage, o Disease prevention, o Distributed parallel architecture, http://www.revistaie.ase.ro/content/62/12%20- %20Boja.pdf, specific for BD apps; o DP, see Dynamic Programming; o Dynamic Programming, see DP, http://en.wikipedia.org/wiki/Dynamic_programming, one of the most used Operations Research tools again!;
  • 140.
    Semantic Pill 9 Tips:Emotional Intelligence regains its place within our “Tech Era”, emphasizing the communication power of gestures and attitudes, see Edwin Friedman; we as spontaneous Web users are continuously building a new “semiotics” where semantic is one of its branches; EaaS is a new term, rather ambiguous still: some companies like Hewlett Packard use it related to Cloud Services that at their turn look like a “bazaar”: please, as a sample take a look to its Piksel audiovideo; e-Bay Principal Architect Tom Fastner speaking at the Teradata Partners Conference held in Dallas, 23rd October 2012: In monitoring their 100 million customers' interactions - from every button they click to every product they buy - eBay creates 12TB of data per day which is continually added to a 4 petabyte table containing 4tn rows of data. As the data is queried both by automatic monitoring systems and employees looking to find more meaning from it, data throughput reaches 100 petabytes (102,400TB) per day. Finally EU4ALL is an EU initiative for Accessing Lifelong Learning for Higher Education mainly sponsored by UNED Spain and Open University from UK. Its relationship with BD is indirect thru its connection with the ITC4D, Information and Technologies for Development already seen; o EaaS, Everything as a Service; o eBay, http://www.v3.co.uk/v3-uk/news/2302017/ebay-using-big-data-analytics-to-drive- up-price-listings, how eBay handles BD; o EIP, Enterprise Information Portal, see EP, Enterprise Portals; o Emotional Intelligence; o Enterprise portals, see also EIP, they try to offer what’s evolving by itself, namely “business Portals”, http://en.wikipedia.org/wiki/Enterprise_portal; o EU4ALL; o European Union for Assisted Life Long Learning, see EU4ALL;
  • 141.
    Semantic Pill 10 Tips:we are going to see soon an industry of icons and gadgets related to Big Data like for instance an icon to go to Hadoop; Gartner Group is a well known IT Research and Consulting firm specialize in intelligence reports about business trends: you should know its two brands tools, “hype cycle” to evaluate a technology life and “magic quadrant” (or MQ) to evaluate markets: it has an interesting and authoritative IT Glossary that says about Big Data: Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. O Reilly announces STRATA Conference next February, in Santa Clara, California, saying: the future belongs to those who understand Big Data and presents an article about “genomics” saying: The amount of data being produced by sequencing, mapping, and analyzing genomes propels genomics into the realm of Big Data. Genomics produces huge volumes of data; each human genome has 20,000-25,000 genes comprised of 3 million base pairs. This amounts to 100 gigabytes of data, equivalent to 102,400 photos. Sequencing multiple human genomes would quickly add up to hundreds of petabytes of data, and the data created by analysis of gene interactions multiplies those further. Windows gadgets Are we entering into a Big Data Gadgets mania?, by Avancos o Gadgets, http://en.wikipedia.org/wiki/Gadget; o Gartner Group; o genomics, http://strata.oreilly.com/2013/08/genomics-and-the-role-of-big-data-in- personalizing-the-healthcare-experience.html, genomics and Big Data;
  • 142.
    Semantic Pill 11 Tips:Google gadgets as seen above have been deprecated by Google: however gadgets always will have a significant place within the ICT Community and specifically within Big Data, see figure below; GIS stands for Geographic Information System and was always a typical application of “Big Data of any time” that is always capturing, structuring, retrieving and managing data at the limit of the available technologies, for instance GIS Tools for Hadoop; Something similar occurs with GPS, Global Positioning System (see ephemeris and its world); by the way along our explorations guides by this semantic seed we face from time to time interesting authorities related to our main subject: see Content about GPS by Big Data Insight Group; HaaS, Harware as a Service alone has not too much sense perhaps less than SaaS, Software as a Service because for ICT it would be more about the same. However HaaS and SaaS related to Big Data have sense because Big Data is a challenge for everybody, vendors, buyers and users (see HaaS now by the University of California at Berkeley) ; Finally we arrive to Hadoop from Apache. If you are a beginner we advise you to star reading its introduction What is Apache Hadoop: The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The project includes these modules: Hadoop Common: The common utilities that support the other Hadoop modules; Hadoop Distributed File System (HDFS™); A distributed file system that provides high-throughput access to application data; Hadoop YARN: A framework for job scheduling and cluster resource management; Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. Ericson said that Data Traffic doubles this year and trend continues, by gadgets.ndtc.com o GIS; o Google gadgets, http://en.wikipedia.org/wiki/Google_Gadgets; o Google Trends; o GPS databases, http://en.wikipedia.org/wiki/Global_Positioning_System; o HaaS, Hardware as a Service; o Hadoop Distributed File System, see HDFS;
  • 143.
    Semantic Pill 12 Tips:HDFS architecture has been mentioned above as part of Hadoop: YOYOClouds define it as: HDFS is a block-structured file system: individual files are broken into blocks of a fixed size. These blocks are stored across a cluster of one or more machines with data storage capacity. Individual machines in the cluster are referred to as DataNodes. A file can be made of several blocks, and they are not necessarily stored on the same machine; the target machines which hold each block are chosen randomly on a block-by-block basis. Thus access to a file may require the cooperation of multiple machines, but supports file sizes far larger than a single-machine DFS; individual files can require more space than a single hard drive could hold.If several machines must be involved in the serving of a file, then a file could be rendered unavailable by the loss of any one of those machines. HDFS combats this problem by replicating each block across a number of machines (3, by default). HCP stand for Human Connectome Project, a NIH, National Institutes of Health project about Neuroscience Research to build a “brain map of healthy humans” to “see more and better” its connectivity architecture and functionality to shed light on brain disorders such as dyslexia, autism, Alzheimer’s disease and schizophrenia; House the World is global project to “house the world” in the sense of housing for all focusing on durability and affordability, see New Architectural Design, finding solutions for the extremely poor; ICT4D has been considered in several previous pills however this discipline may open our minds to see (Mobile and Development) how advanced technologies could be more usable and efficient for the poor and people with disabilities than for the rich and health; igoogle discontinuity is a demonstration of the ephemeral nature of the Internet projects: big Internet masses behave as without having “inertia” with the subtleness of the “nothing and when something stops its growth it starts to die!; HDFS architecture, by YOYO Clouds o HCP, see connectomis; o HDFS, an open Apache BD management; o House the World, http://housetheworld.org/open-develpment- model/crowdsourcing/?gclid=CO302LCPi7sCFe3m7AodLkMADw, a BD philosophic approach; o Human Connectome Project, o IaaS, Infrastructure as a Service; o ICT4D, http://en.wikipedia.org/wiki/Information_and_communication_technologies_for_development; o iGoogle, discontinued, personal Web pages, open and free gadgets library; o Information and communication technology for development, see ICT4D;
  • 144.
    Semantic Pill 13 Tips:LSST that stands for Large Synaptic Survey Telescope is “widest, fastest, deepest eye of the new digital age”: The 8.4-meter LSST will survey the entire visible sky deeply in multiple colors every week with its three-billion pixel digital camera, probing the mysteries of Dark Matter and Dark Energy, and opening a movie-like window on objects that change or move rapidly: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. LFE, Learn From Everyone is A knowledge sharing initiative launched by young Chinese, from Ministry of Tofu, China: it proposes something as old as our civilization: knowledge and wisdom is in any place and at any moment at any culture and even in any creature; Legal citation and claims is closely related to news areas as The Practice of Law in the age of Big Data; Linked Data is a term coined by Tim Berners Lee the Web creator making reference to uses of structured data and closely related to LOD, Linked Open Data, a Community Project at its turn related to Data Commons and Open Knowledge: all these initiatives are oriented to build a Semantic Web, as structured as possible; Example of a piece of LinkedData, as per Wikipedia o Large Synoptic Survey Telescope, see LSST, http://www.lsst.org/lsst/; o Learn from everyone, http://www.ministryoftofu.com/2013/08/learn-from-everyone-a- knowledge-sharing-initiative-launched-by-young-chinese/; o Legal citations; o Legal claims; o Linked Data, http://en.wikipedia.org/wiki/Linked_Open_Data; o Linked Open Data, see also Linked Data; o LOD; o Listen to Everything, it has many acceptations, see this http://www.infowars.com/mit- future-smartphones-will-listen-to-everything-all-the-time/; o LSST, see Large Synoptic Survey Telescope;
  • 145.
    Semantic Pill 14 Tips:Map Reduce is a methodology to process large data sets via parallel and distributed tools and algorithms. Conceptually the idea is not new: most sorting techniques applied in conventional computing used similar procedures, especially when challenged with large data sets. It has two main procedures, namely: 1) map o mapping that by filtering, masking and sorting data sets are open in streams, and 2) reduce, synthesizing and summarizing those streams. MapReduce also refers to a similar methodology used by Google. Hadoop is one of the implementations of this idea. See also the mother idea-paradigm “divide and conquer algorithms”; Below a MDP, Markovian Decision Process schema of an “entity”, either real or artificial is depicted, with three possible “states” S0, S1 and S2, and only two possible “actions”, a0 and a1 to state change no matter the state. In order that these sort of automatons represent “alive” entities should exist an associated probability P(a) *s. s’+ that state change from s to s’ at time (t+1) by executing action (a). In order to evolve - or at least to have a reason of existence - we should associate to this automaton a “Reward” function R(a) *s, s’+ when state changes from s to s’ due to action (a). These rewards are associated to learning. It also defines the 4-tuple [S A P R], State, Action, Probability, and Reward. Mapreduce Google Rank Examples from admin-magazine.com Fuente: MDP from Wikipedia o mapreduce, a BD model of parallel processing, http://en.wikipedia.org/wiki/MapReduce, see also hadoop; o Markovian Decision Process, see MDP and the old Richard Bellman algorithms;
  • 146.
    Semantic Pill 15 Tips:Massivelly Parallel Processing is perhaps a term more specific to Big Data because it makes reference to a family of parallel procedures related to the art of computing for example Grid Computing, Cloud Computing, Computer Cluster, Infiniband, at large a maremagnum of names and acronyms. However we recommend you to read/study/review these “common sense” laws to take into account whens dealing with parallel processing at big scale: Amdahl Law, Gustafson Law, Flat Metric (could be considered a law), and Moore Law; You may also be acquainted with McKinsey Reports, then take a look to this: Big data: The next frontier for innovation, competition, and productivity (May 2011, extrapolate it as of January 2014!): MGI (McKinsey Global Institute) studied big data in five domains—healthcare in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal-location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US healthcare expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal-location data could capture $600 billion in consumer surplus. The research offers seven key insights. McKinsey Forecast (2025): Developed versus developing economies impacts: 3D Printing among 12 new technologies o Massive Parallel processing, http://en.wikipedia.org/wiki/Massively_parallel_(computing); o McKinsey reports; http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovati on; o MDP, synthesized as R(a)*s, s’+: Reward on (a) action, from state s going to state s’; o META Group, see Gartner, it was acquired by Gartner;
  • 147.
    Semantic Pill 16 Tips:WeatherSignal: Big Data Meets Forecasting, from a Scientific American Blog talks about the impact of Big Data on forecasts on different areas from weather to health issues stating polemic the following controversy: The philosophy of Big Data is that insights can be drawn from a large volume of ‘dirty’ (or ‘noisy’) data, rather than simply relying on a small number of precise observations – a subject covered in detail by Viktor Mayer-Schönberger and Kenneth Cukier in their recent book ‘Big Data’. One good example of the success of the ‘Big Data’ approach can be seen in Google’s Flu Trends which uses Google searches to track the spread of flu outbreaks worldwide. It is also important to remember that Big Data when used on its own can only provide probabilistic insights based on correlation; The true benefit of Big Data is that it drives correlative insights, which are achieved through the comparison of independent datasets. It is this that buttresses the Big Data philosophy of ‘more data is better data’; you do not necessarily know what use the data you are collecting will have until you can investigate and compare it with other datasets. Mike2.0 is an Open Source collaborative private undertaking trying to build and lead a sort of Information Management community; MINE, Maximal Information - based Non Parametrical Exploration, deals with visualization of datasets basically of “pairs” represented as a Cartesian X, Y Map: in order to “see more and better” these maps you need to know first MIC, Maximal Information Coefficient measures the strength of linear or non linear associations between X and Y. MIC belongs to a statistical class experimentally used for Detecting Novel Associations in Large Data Sets (Jun 2012): Imagine a dataset with hundreds of variables, which may contain important, undiscovered relationships. There are tens of thousands of variable pairs—far too many to examine manually. If you do not already know what kinds of relationships to search for, how do you efficiently identify the important ones? Datasets of this size are increasingly common in fields as varied as genomics, physics, political science, and economics, making this question an important and growing challenge). One way to begin exploring a large dataset is to search for pairs of variables that are closely associated. To do this, we could calculate some measure of dependence for each pair, rank the pairs by their scores, and examine the top-scoring pairs. For this strategy to work, the statistic we use to measure dependence should have two heuristic properties: generality and equitability. Source: Scientific American, Smartphone Weather Signal Dashboard o Meteorology forecasts, Scientific America, Big Data meets Forecasting, http://blogs.scientificamerican.com/guest-blog/2013/10/11/weathersignal-big-data-meets-forecasting/; o MIKE2.0, http://mike2.openmethodology.org/; o MINE, Maximal Information-based Non Parametrical Exploration, http://www.exploredata.net/;
  • 148.
    Semantic Pill 17 Tips:Go to Mobile massively: the UIC, University of Chicago thru its Office of Technology Management has launched a program to encourage and help students to create their own Apps for Development and Deployment using IOS, iPhone OS and Android OS; MSL, Multilinear Subspace Learning, is a family of ideas and applications to “see more and better” large multidimensional objects: most objects are multidimensional (relative to our visual capacity limited to 3D): Most big data sets are multidimensional with objects rarely distributed highly redundant and noisy and for these reasons techniques of dimensionality reduction are used map high-dimensional data to a low-dimensional space while retaining as much information as possible. We recommend to review Matlab, you are going to need it, and specifically about Matlab Tensor Box; http://www.csc.com/cscworld/publications/81769/81773- supercomputing_the_climate_nasa_s_big_data_mission, a CSC article about the NCCS Discover supercomputing cluster, which ranks among the top 100 supercomputers in the world, plays a central role in NASA’s earth science mission and is the main system used for processing jobs that require significant computing resources. To put in numbers some goals: Discover can compute in one day three simulated days in the life of the Earth at one of the highest resolutions ever attained — about 3.5-kilometer global resolution, or about 3.6 billion grid cells. The center’s current “stretch” goal is to generate in one day a computation that covers 365 days at 1-km global resolution; NASA Brings “Big Data” to the Cloud, by the USRA, Universities Space Research Associations o mobile app development and deployment, example of a “Mobile rush” Initiative of the University of Illinois, Chicago, http://otm.uic.edu/node/4371; o MPP, see Massive Parallel Processing; o MSL, see Multilinear Subspace Learning; o Multilinear subspace learning, see MSL; o Nasa Center for Climate Simulation, see NCCS o NASA, NASA Big Data Mission, http://www.csc.com/cscworld/publications/81769/81773- supercomputing_the_climate_nasa_s_big_data_mission;
  • 149.
    Semantic Pill 18 Tips:Infectious Diseases following natural disasters, another big scenario, from NLM, The National library of Medicine, NIH, National Institute of Health: this article will trigger on our mind something “metadata related” that is we found on it something valuable for our Web Semantic eLearning process: the “tag” MeSH which stands for Medical Subject Headings, that belong to the NLM Controlled Vocabulary of the PubMed Thesaurus, something like a Web Thesaurus! This article says (abstract): Natural disasters may lead to infectious disease outbreaks when they result in substantial population displacement and exacerbate synergic risk factors (change in the environment, in human conditions and in the vulnerability to existing pathogens) for disease transmission. We reviewed risk factors and potential infectious diseases resulting from prolonged secondary effects of major natural disasters that occurred from 2000 to 2011. Within Natural Disasters are some ones closely related to visible Earth changes like for instance the Weather, see the image below in Natural Disaster and Extreme Weather, a collection of environmental articles published by The Guardian (UK) like Australia links 'angry summer' to climate change – at last!;The navigation paradox deals with collision risk in navigation via any object either real or virtual: aircraft, ships, cars. A paradox could be either a valid or not valid argument but helps as a tool of analysis and to enhance our critics. For example when talking about the incidence of navigation risks as a function of technology and people awareness of it (how well they use it) we may easily arrive to - or driven to- contradictions like that for the best technology with an excellent level of awareness may create at large risk scenarios. ` Extreme Weather and Global Warming are linked, The Guardian o Natural disasters, http://www.ncbi.nlm.nih.gov/pubmed/22149618 ; o Navigation paradox, http://en.wikipedia.org/wiki/Navigation_paradox; o NCCS, see also CSC NCCS, supercomputing program to “supercomputing the climate” from NASA;
  • 150.
    Semantic Pill 19 Tips:Are we entering into a real “Big Science” or are we as ever striving hard pushing the frontier of the unknown? Data Driven Discovery talks about a digital copy of the universe encrypted, (see LSST, Large Synoptic Survey Telescope in previous pills): “The data volumes we *will get+ out of LSST are so large that the limitation on our ability to do science isn’t the ability to collect the data, it’s the ability to understand the systematic uncertainties in the data,” said Andrew Connolly, an astronomer at the University of Washington. Non Linear System Identification deals with identifying the different types of real life systems under study namely industrial processes, control systems, economy, life sciences, medicine social systems and networks, etc, because most of them are nonlinear being linearity a form of idealization in order to study them under the laws and tools of math and logic. This article goes a little ahead idealizing data itself affirming: “Finding the unexpected in a higher-dimensional space is impossible using the human brain.” We recommend to deep a little about the four main types of NLS: 1) Volterra series models, 2) block structured models, 3) neural network models, and 4) NARMAX models; “The figure that illustrate this pill shows 4 types of equilibrium you should distinguish in order to understand Big Data better because one usual approach is to describe the solutions globally (via nullclines). What happens around an equilibrium point remains a mystery so far. Here we propose then to discuss this problem. The main idea is to approximate a nonlinear system by a linear one (around the equilibrium point). Of course, we do hope that the behavior of the solutions of the linear system will be the same as the nonlinear one. This is the case most of the time (not all the time!).” Open Data (see the list of over 200 local, regional and national open data catalogues) is a global movement and semantically a universal idea in formation. From its very beginning, at the dawn of our civilization, data was closely related to any type of asset and with the nature of a “capital” by itself (see Merton Thesis) and its “opening” will go head to head parallel to the development of our societies. For this reason what really crucial is the Web contribution to this opening (see Datacatalogs); Finally PaaS, Platform as a Service, is one of the three pillars of Cloud Computing Services that includes SaaS, Software as a Service, IaaS and Infrastructure as a Service. Users of this service may create, control and set their own software under the contracted platform. Remember that PaaS has not too much sense alone but within an almost enforced trilogy [PaaS, SaaS, IaaS]; KRIT algorithm, Monfort University (UK)
  • 151.
    Strange Attractor Visualization,from Chaoscope o New Big Science, https://www.simonsfoundation.org/quanta/20131002-a-digital-copy-of-the- universe-encrypted/; o Nonlinear system identification, system identification applied to nonlinear system (generally of high complexity), http://en.wikipedia.org/wiki/Nonlinear_system_identification; o Open Data AR, http://datospublicos.gob.ar/; o Open Data UK, http://data.gov.uk/; o Open Data US, http://www.data.gov/; o Open Data, see also Open Data Initiative, http://en.wikipedia.org/wiki/Open_data, Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control; o PaaS, Platform as a Service;
  • 152.
    Semantic Pill 20 Tips:Pig and Pig Latin are a Platform and a language to create MapReduce programs within Hadoop; Q-Learning algorithm, it’s a “Reinforced learning” process, where an agent gather as much as possible positive (rewarding) experience. The two figures below show how to face an AI, Artificial Intelligence algorithm of Q-learning. The house has 6 “rooms”, five, from 0 to 4 inside the house and sixth outside as an open room. The challenge is to acquire positive experience to train an agent to get out of the house (Finish) beginning on room 2. We may “go” at random starting from “state” 2 and building a “table” or matrix of “rewards”: for example 0 if going “wrong” and 100 if going “right”. We may extend this tiny and basic model to any degree of complexity, trying to find the right way from any place within any labyrinth; Quality of research data, and Quality of research, sometimes forgotten items. This paper deals with a double polemic theme: quality and how research - in general terms- is really performed. The analysis was extended to three disciplinary domains applied by the European Science Foundation: Physical Sciences and Engineering, Social Sciences and Humanities, and Life Sciences. Finally Social Sciences and Humanities and Life Sciences were summarized as one. The attitude in Physical Sciences and Engineering would seem to be that quality control of data can best be effectuated through citation of datasets and quality-related comments on those datasets which are made available through Open Access data publications. No need is expressed for codes of conduct, training in data management, or peer review of data that is published together with articles. In Life Sciences there is first and foremost a need for a code of conduct for dealing with data. Training in data management fits in with this. A direct judgment on quality can be given through peer review of the data that is published together with articles and through quality-related comments, a derived judgment through data publications and citations. Open access to data does not score highly. Interestingly enough, Life Sciences are ahead of the other disciplines as regards open access to articles. Warning: we have to take into account that quality of data is essential and a necessary condition not only for Big Data but for everything: we behave based on data. Finally RTDP has many meanings for example Real Time Data Platform and Real Time Dynamic Programming and within this appears as related to the semantic under study in this pill the theme “Learning to Act using Real Time Dynamic Programming” of which we reproduce here part of its abstract: Researchers have argued that DP provide an appropriate basis real time control as well as for learning when the system under control is incompletely known. RTDP is DP based algorithm by which an embedded system can improve its performance with experience. It is a generalization of Korf’s Learning Real Time - A* algorithm to problems involving uncertainty;
  • 153.
    Source: Q-learning frommnemstudio.org o Pig language, a language to “mapreduce” and “hadoop” management, http://en.wikipedia.org/wiki/Pig_(programming_tool); o Q-learning algorithm, http://en.wikipedia.org/wiki/Q-learning, it’s a “Reinforced learning” process, http://artint.info/html/ArtInt_265.html, see here how an agent gather positive (rewarding) experience (see learning and reinforced learning); o Quality of research data, http://www.dlib.org/dlib/january11/waaijers/01waaijers.html; o Quality of research, a forgotten them: scarce and deficient data and lack of interest; o Real Time DP, see RTDP; o Reinforced learning, see Q-learning;
  • 154.
    Semantic Pill 21 Tips:Remote Sensing: for example for Geospatial Analytics for Big Spatiotemporal Data: Algorithms, Applications, and Challenges” it says: We are living in the era of `Big Data.' Spatiotemporal data, whether captured through remote sensors (e.g., remote sensing imagery, Atmospheric Radiation Measurement (ARM) data) or large scale simulations (e.g., climate data) has always been `Big.' However, recent advances in instrumentation and computation making the spatiotemporal data even bigger, putting several constraints on data analytics capabilities. In addition, large-scale (spatiotemporal) data generated by social media outlets is proving to be highly useful in disaster mapping and national security applications. Spatial computation needs to be transformed to meet the challenges posed by the big spatiotemporal data. The Big ones: the ESG, Earth System Grid could be considered one of the World Big Data scientific portals Sampling and particularly representative sampling is pure statistics and an all times universal problem: how to wisely sample a universe to get the information we need. However Big Data universes and situations may present new challenges specially when considering unstructured data, relatively unknown, noisy, erratic and most times unpredictable like we may find in Social Data: “The Pitfalls of using online and social data in Big Data analysis”: In her draft paper, Big Data: Pitfalls, Methods and Concepts for an Emergent Field, UNC professor and Princeton CITP fellow Zeynep Tufekci (@zeynep) compares the methodological challenges of developing socially-based big data insights using Twitter to biological testing on Drosophila flies, better known as fruit flies. Drosophila flies are usually chosen because they’re relatively easy to use in lab settings, easy to breed, have rapid and “stereotypical” life cycles, and the adults are pretty small. The problem? They’re not necessarily representative of non-lab (read: real-life) scenarios. Tufekci posits that the dominance of Twitter as the “model organism” for social media in big data analyses similarly skews analysis. Sampling was, is and will be fundamental. Now within the “Big Data” move we have to be more careful than “before” (one year from now!) concerning this problem, The figure below depicts 4 ways of a “zonal sampling” each one coherent but with 4 probable different outcomes; Roadway Traffic Control is an old Big Data experience and now there is a proliferation of integral solutions, fundamentally to avoid congestions and whether possible keep the circulating community communicated: see the T-system, Big Data in Traffic and Big Data in the Automotive Industry: As you know, cars can’t speak. If they could, they would be able to provide a wealth of information that would be invaluable to drivers, repair shops and automakers alike. To gain access to this data – and help the car talk – more and more vehicles are being fitted with sensors and connectivity solutions. According to a study by management consultants Oliver Wyman, 80 percent of all autos sold in 2016 will be connected. That would equate to approximately 210 million talking cars cruising round our streets. Compared to 45 million autos in 2011, that is a projected annual growth rate of over 36 percent. Connected cars could provide a steady stream of data on vehicle movements, condition, wear and tear of parts, and ambient conditions. Extracting meaning from this mass of mixed data is no easy task. The challenge is transmitting the information, analyzing it and redistributing it to the relevant recipients – all at high speed. It is a challenge that T-Systems can master.
  • 155.
    SaaS, stands forSoftware as a Service, has meaning as a stand alone Software Delivery Model or as forming part of the Cloud Computing trilogy [SaaS, IaaS, PaaS]; Source: Habitat Maps for the EU (MESH) o Remote Sensing, the era of big geospatial data, http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/raju-bigspatial.pdf; o Representative samples, classical: http://en.wikipedia.org/wiki/Sampling_(statistics), cares to be taken when considering Social Data: http://sloanreview.mit.edu/article/the-pitfalls-of-using-online- and-social-data-in-big-data-analysis/; o Roadway traffic control, see https://www.thalesgroup.com/en/worldwide/transportation/road- traffic-management; o RTDP, Real Time Dynamic Programming; o SaaS, Software as a Service;
  • 156.
    Semantic Pill 22 Tips:SBA that stands for Search Based Applications, make reference a “semantic Applications” where the core of the architecture rests on a “semantic search engine”. Data should be - ideally - fully structured from a semantic point of view: however it is possible to build via Artificial Intelligence a sort of “Semantic Glasses” to see the whole Web or part/region of it as semantically structured. See Darwin Semantic Glasses in the e-book “Semantic Web”. SDSS, Sloan Digital Sky Survey, is a project for mapping the universe has obtained deep, multi-color images covering more than a quarter of the sky and created 3-dimensional maps containing more than 930,000 galaxies and more than 120,000 quasars. You may click on the figure below (in the Website) and you will go to an enlarged vision of it with the Earth in the center and a point representing a galaxy typically containing 100 billion stars each! SML, Social Media Listening, Social Media Monitoring and Social Media Measurement, deals with paying attention to what “people” say about anything, fundamentally having in mind a Web Ontology that “sees” the Web as a dual continuous interacting model: the “Established Order” (the law, governments and governors, all type of institutions, authorities, teachers, masters and who teach, fabricants and sellers of products and services,…) by one side and the people as the other (governed, users, who learn, students, buyers, solicitants, ….). See SMM Social Media Monitoring from Huffington Post.com that textually affirms that “it pays to listen”. I prefer to think of it as a technology solution to what my mother told me when monitoring my growing up: God gave you two ears and one mouth for a reason; Galaxy Mapping, from Sdss.org o SBA, Search Based Application; o SDSS, the world project to map the Universe, http://www.sdss.org/; o Search Based Applications, see SBA; o Sloan Digital Sky Survey, see SDSS; o SML, Social Media Listening, a neologism by SMM, Social Media Monitoring; o SMM, http://www.huffingtonpost.com/robert-ball/social-media-monitoring-i_b_833702.html, it pays for listening.
  • 157.
    Semantic Pill 23 Tips:Social Genome is a new term coined in relation to a Wal-Mart/Facebook project: Huffingtonpost.com says to this respect: WalmartLabs defines the "Social Genome" as "a giant knowledge base that captures entities and relationships of the social world." Wal-Mart has spent the last few years building this in-house Social Genome, part public data, part private data, "and a lot of social media." Tweets, Facebook messages, blog posts, You Tubes, its all streaming into Wal-Mart. Streaming in so fast, that WalmartLabs created something they call Muppet, a solution for processing Fast Data, using large clusters of machines. The Labs describes the Social Genome as their "crown jewel." Social Media Listening has been extensively dealt in previous pill. We aggregate that it will soon become an art and a science. In the figure below the necessary “feedback” loop is missing at any step threading a sort of feedback embedding and is also missing something like a wise guiding invisible hand well acquainted about how we humans document our opinions and messages, perhaps a Human Documentation Ontology; SSD, only related to Big Data, for example Kingston Introduces New Enterprise SSD to Support Big Data and Virtualization Initiatives (up to 480 GB); Source: SML from anchormedia.com o Social Genome, related to Wal-Mart social genome a Wal-Mart/facebook project related to BD; o Social listening, see SML; o Social media listening; o Social Media Monitoring, see SMM; o solid state drive, see SSD; o SSD, Solid State Drive;
  • 158.
    Semantic Pill 24 Tips:Strategic Planning is an old and always alive concept: we may see what the classic Harvard Business Review says about this concept related to our actual Big Data: The Management Revolution (Oct 2012): Business executives sometimes ask us, “Isn’t ‘big data’ just another way of saying ‘analytics’?” It’s true that they’re related: The big data movement, like analytics before it, seeks to glean intelligence from data and translate that into business advantage. However, there are three key differences, fundamentally “Volume” and “Velocity” (2 out of the 3V); System Identification related concepts has been presented in previous pills however we recommend to see Perspectives on System Identification, by Lennar Ljung, a Citseer paper. Its abstract says: System identification is the art and science of building mathematical models of dynamic systems from observed input-output data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. We recommend to deep on its methodology core as follows: 1) estimate globally the model (m), 2) a True Description of the model (S), 3) the model class of pertinence of the model (M), 4) The Complexity (C) of the model class, 5) all the Information available about the object to be modeled: observed data and everything that aid to describe it, 6) Validation, that involves generalization: validation data sets(Z), 7) Model Fit (F) that explains how well our model (m) adapts to a (Z) dataset: F(m, Z); By Technology Forecasting we refer to something really new because up to now we humans were used to generate “futuribili”, visions of the future, without even imaging about the technology to make it possible. Up to now there was a general belief: once we humans are convinced that something is possible, no matter efforts and resources needed, we start “ex post” to think about the “how to”. Let’s see then techniques to guide our imagination to those “how to”: Delphi Method one of the more used belong to learning by Q&A rounds among experts model, Forecast by Analogy as a form of reinforcing suppositions based on credible analogies and all type of projections obtained by different extrapolation criteria (for example based on Growth Curves); Tensors (and Matrices) are something that you should know in some extent and this (52 pages PDF document) could be a good introduction to Knowledge Discovery and Data Mining tools and technique that you are going to need in Big Data. Tensors and Tensor Calculus are essential in some disciplines like all referred to quantum: Quantum Mechanics and Quantum Computing. The figure below depicts a tensor visualization of the Cauchy Stress Tensor: Tensors are geometric objects that describe linear relations between vectors, scalars, and other tensors. Elementary examples of such relations include the dot product, the cross product, and linear maps. Vectors and scalars themselves are also tensors. A tensor can be represented as a multi-dimensional array of numerical values. The order (also degree) of a tensor is the dimensionality of the array needed to represent it, or equivalently, the number of indices needed to label a component of that array. For example, a linear map can be represented by a matrix, a 2-dimensional array, and therefore is a 2nd-order tensor. A vector can be represented as a 1-dimensional array and is a 1st-order tensor. Scalars are single numbers and are thus 0th-order tensors. Tensor Toolbox: see Matlab Tensor Box in previous pills;
  • 159.
    Tensor as ofWikipedia o Strategic Planning CP, (Control Panel); o System identification, old system concept, building of a model of it via its behavior study; o Technology forecasting; o Tensor, its association to Big Data and Data Mining looks like a revival of geometry and math, http://users.cs.fiu.edu/~taoli/kdd09-workshop/DMMT09-proceedings.pdf; o Tensor toolbox,
  • 160.
    Semantic Pill 25 Tips:TDA, Topological Data Analysis is something that should be carefully studied at least conceptually if you are not strong on math as it is fundamental in Data Mining, Visualization, Semantics and now embedded in Big Data: The main problems are: 1. how one infers high-dimensional structure from low-dimensional representations; and 2. how one assembles discrete points into global structure. The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points. The main method used by topological data analysis consist of three steps: a. Replace a set of data points with a family of simplicial complexes, indexed by a proximity parameter; b) Analyze these topological complexes via algebraic topology — specifically, via the theory of persistent homology, c) Encode the persistent homology of a data set in the form of a parameterized version of a Betti number which is called a barcode. New concepts to add to this Mini Thesaurus: Simplicial Complexes; Persistent homology; Betti number; Barcode; “To see more and better”, this term as exact has 1,340,000 references in Google appearing like a “meme” or goal of research and innovation. It is also the “motto” of our Darwin Methodology: to build tools to “see more and better” the Web, like the Darwin Semantic Glasses. Tuples: A tuple is an ordered list of elements and Tuple Space a space of tuples to be used sometime, somewhere and somehow: A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of processors that produce pieces of data and a group of processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. Vector Processing refers to process by vectors instead of processing by single data or “scalar” one at a time. This technique could be used not only as a possible architecture to build supercomputers bust also as program. Our Darwin Methodology process “by textons” resembling vectors of Web documents. Watkins Q-Learning Algorithm, points to the Watkins thesis (1989): Learning from Delayed Rewards, a liminal work of 220 pages. The thesis faces a crucial query behavioral scientists make to themselves: how might the animals learn optimal policies from their experience? and going a little deeper: is it possible to give a systematic analysis of possible computational methods of learning efficient behavior? Weather Forecast has been reviewed in previous pills however we suggest to read Big Data Reshapes Weather Channel Predictions, an article about The Weather Company from InformationWeek.com:
  • 161.
    "Weather is theoriginal big data application," says Bryson Koehler, executive VP and CIO at the Weather Company. "When mainframes first came about, one of the first applications was a weather forecasting model." Flash forward to today and the Weather Company ingests some 20 terabytes of data per day to spin out what Keohler bills as the world's most accurate forecasts. To stay ahead of its competition, the Weather Company is in the process of rolling out a new platform built on Basho's Riak NoSQL database and running globally in the Amazon Web Services (AWS) cloud. Source: DARPA Topological Data Analysis, from Big Data, Wikipedia o TDA, see DARPA; o Topological Data Analysis, see TDA; o To see more and better, G: 1,340,000 as exact term; it looks like a universal R&D goal; o Tuple space*, a form of associative memory; o Tuple space, http://en.wikipedia.org/wiki/Tuple_space; o Vector Processing, see “Darwin textons”; o Wal-Mart, http://www.bigdata-startups.com/BigData-startup/walmart-making-big-data- part-dna/, it’s a pioneer in BD, see social genome; o Watkins Q-learning algorithm, a specific Q-learning algorithm; o Weather forecasts;
  • 162.
    Semantic Pill 26 FromGartners Newsroom 2012 (DEC 2011) Through 2015, more than 85 percent of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage. Current trends in smart devices and growing Internet connectivity are creating significant increases in the volume of data available, but the complexity, variety and velocity with which it is delivered combine to amplify the problem substantially beyond the simple issues of volume implied by the popular term "big data." Collecting and analyzing the data is not enough — it must be presented in a timely fashion so that decisions are made as a direct consequence that have a material impact on the productivity, profitability or efficiency of the organization. Most organizations are ill prepared to address both the technical and management challenges posed by big data; as a direct result, few will be able to effectively exploit this trend for competitive advantage. 2013 See 10 Strategic Technology Trends for 2013 2014 (NOV/DEC 2013) Predicts 2014: Apps, Personal Cloud and Data Analytics Will Drive New Consumer Interactions 22 November 2013 Mobile apps have become the official channel to drive content and services to consumers. Using big data collated via apps can drastically improve value to consumers. Businesses that develop data tracking and analytics will improve delivery to customers, increasing customer loyalty and acquisition. Predicts 2014: Big Data 20 November 2013 Gartner's 2014 predictions explore how the developing maturity and awareness of big data impacts analytics, resources, data center infrastructure and consumer privacy. Enterprises must adapt to this quickly changing landscape to establish an analytical competitive advantage. Predicts 2014: Cloud Computing Affects All Aspects of IT 4 December 2013 Gartner's 2014 cloud computing predictions shed light on the evolution of the concept as it continues its path toward becoming more and more integral to IT. IT organizations will need to monitor developments in order to adapt their cloud strategies to the realities of tomorrow.
  • 163.
  • 164.
    Concerning Data SizeMatters => Back
  • 165.
    Darwin Methodology -HKM Demo Darwin: It stands for Distributed Agents to Retrieve the Web Intelligence Synthesis for Partners As of October 9 th 2014 “In Mind Idea”, personal, unique and “invisible” Human Knowledge Map, HKM, in numbers Knowledge Wood: It depicts a “wood” of about 200 Branches (a wood of “HK Trees”), for all pairs (culture, language). It opens as: o A Human Encyclopedia of about 600,000 MS, Major Subjects <=> Major Concepts <=> ~3,000 MS per Branch of a HK Tree o An “In Mind Ideas” Universe of about: 20,000,000 CS, Common Subjects (topics?) <=> minor, common concepts <=> ~35 CS per MS o As of today justified by a World Data Reservoir of about 40,000,000,000 Web pages Knowledge Sample: (1.2% of the whole Web) Branch: The Art Pair culture language: (British, American English) Major Subjects: 7,571 Concepts: ~400,000 CS (common concepts about The Art) Web Universe of sample: ~100,000,000 pages First HKM Version: Effort: ~300 men-months (scientific, academic and professional top levels) Time: 6 months (“alpha test”); 3 months more than for its “beta test”; Languages: English and Spanish
  • 166.
    Forms of Knowing oQ&A: By querying sources of information, knowledge and wisdom History: Humans and God Oracles; Shamans and gurus; Ancient & Wise people; Libraries; Temples; Spirit workings; Spiritual Orientations; o Experiencing: for a living, for fun, to survive, work and all type of Working Activities; Gaming; Entertainments; Arts & Crafts; Adventures and risky activities; Innovations; o Studying: fundamentally the “Established Knowledge” Being active part of “The Education” by exercising the pair “Teaching - Learning” one at a time or both concurrently (depending of cultures); master disciple relationships; Darwin performs all three because it works on a Web Thesaurus that “sees” the Web “more and better” as if it were semantically indexed with ideal metadata approaching to the “best probabilistic truth” at any moment. The “anthropic” component of Darwin: the HK seeds: 12 ways of drafting semantic seeds example Retrieving from “zero ground”: Darwin Methodology may retrieve any MS, Major Subject from the Web, provided the statistic significance of the Web Universe that deals with it, from “zero ground” in terms of knowledge. However it could be costly in terms of trials.
  • 167.
    Seeds: Darwin agents(robots) explore the Web recursively starting from a seed, a sort of primal and basic tree where are only depicted “suspected” SMS, Super Major Subjects of the discipline to retrieve that when the whole tree is retrieved probably become part of the upper levels of it. These seeds are provided by human experts. These seeds are made grown via a Semantic Ontology (Darwin Ontology) and continuously checking of its axioms is performed. For each seed Darwin computes its semantic consistence and suggest humans changes in content and form, for instance changes within the neighborhood of certain critical “knowledge nodes”. The Web may hide unexpected “best truths”: Along years 2009 - 2010 Darwin was used to retrieve a relatively heavy and complex discipline: The Art. As at that time as we did not have the possibility of creating a trustable and good enough seed we started from “zero ground” equals “total ignorance”. After trying four seeds we arrived to a rather atypical and unexpected result: Three SMS, Super Major Subjects, appeared, namely: Culinary Art, Physical Arts and almost half of the Art Infrastructure (see Vision 2). We discussed this Darwin agent’s “discovery” with Art Experts and they agreed: what Darwin found was perhaps less “selected” that the classic idea of “The Art” experts perhaps have within their minds but evidently it was more ample, popular and closer to the modal and statistical truth. Concepts versus keywords: Partial Visualization of The Art Tree skeleton Darwin works with concepts: Darwin may query by concepts: The figure above depicts a partial and in some extent reductionist visualization of four minor subjects of The Art: Rigoletto, Light Lyric, Fantasy novel and Paella. All these are considered “keywords”; Concepts, on the contrary, are unique representations of “in mind ideas” that could (in fact should…) be instantiated and they are semantically and logically defined as a chain of links; For instance the concept Rigoletto here points precisely to the Opera Rigoletto of Verdi coded as [0.1.2.2.2.2.14.1.6.10.5, Rigoletto], that is, embedded as end link of a “tree track”. In Darwin Rigoletto is seen as a CON-CEPT where the track [0.1.2.2.2.2.14.1.6.10.5] defines its CONtext and the keyword Rigoletto by itself a CEPT, a given name within this CONtext.
  • 168.
    Why Darwin Methodologysucceed to see the Web more and better? i) Because it works within semantic strong HCI, Human Computer Interaction scenarios ii) Because it follows the best Golden Age after World War II Information Technology utopias (1940-1965 period where they flourished, coincident with the “baby boomer” generation). Claude Elwood Shannon - Labyrinth Darwin Ontology: along Claude Shannon findings states that Human Documents are generally performed combining intelligently two and only two types of semantic particles: 1) Common Words and Expressions and 2) Concepts. Darwin Ontology paves the way to understand the sequence: data => information => knowledge => wisdom I N T E L L I G E N C E Math and psycho thinking have much to do: within the Web universe and concerning “man- machine” interrelation mathematicians, engineers, psychologists, epistemologists, and psychologist have much to say. As an example: o Miller’s Law, psychologist, 1929 - 2012; o Fitts’Law, psychologist, 1912 -1965; o Hick-Hyman Law, psychologists, 1912 - 1974; o Power Law of Practice, Newell and Rosenblom Law, psychologists, 1927 - 1992; o Pareto, engineer, economist and Zipf’ Law, linguist, 1902 - 1950; Two rare parallel lives: o Zipf George Kinsley, USA, linguist, statistics, Harvard, 1902 - 1950, Pn ~1/na ; o Fitts Paul, USA, psychologist, ergonomist, UA Air Force, 1912 - 1965; T ~a+b.log(1+D/W) And the best of the best utopias: o Shannon Elwood Claude, USA, 1916 - 2001 mathematician, engineer, MIT, Bell Lab, Nobel Prize: Information Theory, H(X)~-p(x).log p(x); o Von Neumann John, “father” of Computers as we know now, Hungarian/American, 1903 - 1957, mathematician, physicist; o Turing Alan, 1912 - 1957, British, mathematician, philosopher, Turing Machines, ENIGMA; => Back
  • 169.
    Wikipedia Semantic Skeletonas per Wikipedia One of the best avatars as per Darwin Ontology Read some reflections in green  1 History  2 Openness o 2.1 Restrictions o 2.2 Review of changes o 2.3 Vandalism  3 Policies and laws o 3.1 Content policies and guidelines  4 Governance o 4.1 Administrators o 4.2 Dispute resolution  5 Community o 5.1 Diversity  6 Language editions  7 Critical reception o 7.1 Accuracy of content o 7.2 Quality of writing o 7.3 Coverage of topics and systemic bias o 7.4 Explicit content o 7.5 Privacy o 7.6 Sexism  8 Operation o 8.1 Wikimedia Foundation and the Wikimedia chapters o 8.2 Software operations and support o 8.3 Automated editing o 8.4 Wikiprojects, and assessments of articles' importance and quality o 8.5 Hardware operations and support o 8.6 Internal research and operational development o 8.7 Internal news publications  9 Access to content o 9.1 Content licensing o 9.2 Methods of access  10 Impact o 10.1 Readership o 10.2 Cultural significance
  • 170.
    o 10.3 Sisterprojects – Wikimedia o 10.4 Publishing o 10.5 Scientific use  11 Related projects  12 See also  13 References o 13.1 Notes  14 Further reading o 14.1 Academic studies o 14.2 Books o 14.3 Book reviews and other articles  15 External links Wikipedia in brief Jimmy Wales and Larry Sanger launched Wikipedia on January 15, 2001. Sanger[9] coined its name,[10] a portmanteau of wiki[notes 3] and encyclopedia. Initially only in English, Wikipedia quickly became multilingual as it developed similar versions in other languages, which differ in content and in editing practices. The English Wikipedia is now one of 291 Wikipedia editions and is the largest with 5,071,371 articles (having reached 5,000,000 articles in November 2015). There is a grand total, including all Wikipedias, of over 38 million articles in over 250 different languages.[12] As of February 2014, it had 18 billion page views and nearly 500 million unique visitors each month.[13] A peer review of 42 science articles found in both Encyclopædia Britannica and Wikipedia was published in Nature in 2005, and found that Wikipedia's level of accuracy approached Encyclopedia Britannica's.[14] Criticisms of Wikipedia include claims that it exhibits systemic bias, presents a mixture of "truths, half truths, and some falsehoods",[15] and that in controversial topics it is subject to manipulation and spin.[16] Subjects covered (topics) https://en.wikipedia.org/wiki/Portal:Contents/Lists Wikipedia's contents: Lists General reference Culture and the arts Geography and places Health and fitness History and events Mathematics and logic Natural and physical sciences People and self Philosophy and thinking Religion and belief systems Society and social sciences Technology and applied sciences
  • 171.
    Some reflections follow oAt large Wikipedia drives you to only one of ITS articles. For example for this subject: List of Gold Glove Award winners at pitcher, it drives you to this link https://en.wikipedia.org/wiki/List_of_Gold_Glove_Award_winners_at_pitcher If you ask now to Google about this subject as an open search it renders 610,000 references and if questioned as closed (between quotation marks) it renders 103! o WARNING: Many articles like those related to directories and glossaries indexes are only cosmetic copies of existent Web articles. o See as example of something reasonably well written about Zen (259.000.000 as per Google) but misleading: https://es.wikipedia.org/wiki/Zen unfortunately it drives people to a “marketing” biased vision of it! However as per Darwin Vision Wikipedia is one of the best known avatars about Human Knowledge but at large a “classic subjective vision”, a rational synthesis issued by a group of people, many of them “authorities” besides, at a comparable level of quality of the mentioned Encyclopaedia Britannica and Nature. Darwin Vision on the contrary intents to take into account absolutely ALL, EVERYTHING and EVERYBODY: past and present things, actors, contexts and variables related to the avatar under consideration. By intent we mean a demonstrated mind opening to see the ALL. Note: This ANWOT spirit (A New Way of Thinking) intents to go ahead of the methodic doubt of Cartesian philosophy. As an example of ANWOT open mind see below the basic curiosity spirit that should guide searching efforts: what’s then behind ANWOT? EN, "a new way of thinking", 4.280.000 ES, "una nueva forma de pensar", 337.000 IT, "un nuovo modo di pensare", 109.000 FR, "une nouvelle façon de penser", 117.000 DE, "eine neue Art des Denkens", 11.300 PT, "um novo modo de pensar", 82.400 CN, “一種新的思維方式, 40.000 IL, , "‫דרך‬ ‫חדשה‬ ‫של‬ ‫,"חשיבה‬ 6,230 JP, 新しい考え方, 1.350.000 NL, "een nieuwe manier van denken", 31.200 IN, "सोच का एकनया तरीका है ", 255 RU, "новый способ мышления", 20.000 TR, "yeni bir düşünce yolu", 854 Cat, "una nova forma de pensar", 13.700 Gall, "un novo modo de pensar", 553 Eusk, "pentsatzeko modu berri bat", 172 Esperanto, "nova pensmaniero", 230 Árabe, "‫قة‬ ‫طري‬ ‫دة‬ ‫جدي‬ ‫ي‬ ‫ف‬ ‫ير‬ ‫ك‬ ‫ف‬ ‫ت‬ ‫,"ال‬ 9.430 => Back
  • 172.
    Word Searching Weakness ByJuan Chamero, from Buenos Aires, as of November 11th 2014 Physics versus Semantic analogies: We may see the matter of the universe as constituted by either a) molecules building blocks or b) molecules o c) atoms or d) hadrons (protons and neutrons) or quarks. Of course if we choose to see the whole world as formed of molecules building blocks it would be practically impossible to see the world correctly at atoms level and worst at hadrons level so imagine what would happen at quarks level!. Something similar occurs within the Web space: something documented via concepts cannot be seen correctly via words. For instance the noun “dog” could be adequately inserted via Wordnet within a given hierarchy: dog, domestic dog, Canis familiaris => canine, canid => carnivore => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature, fauna => ... But always within “words” level in this example the word dog within the animal realm but nothing to do within hierarchies of knowledge (concerning knowledge Wordnet behave as “flat”, unstructured). You may imagine the word dog as an atom building component of many different molecules and these molecules semantically structured as pieces of knowledge belonging to different branches of knowledge: dog within canine within zoology; dog within “customs dog”; “trained dogs” for many applications; “dog psychology” for veterinary applications and for pet’s care; dog and red dog meanings in engineering; “lazy dog” and “lazy dog breeding” within human pet preferences; “dog entertainment ideas”……; “cattle dog”, “cattle dog breeding”, “Australian cattle dog”, “herding dog” and thousands more!. See the Darwin Semantic Hypercube. => Back
  • 173.
    The Differences betweenData, Information and Knowledge As per Infogineering.net Darwin comments in red, by Juan Chamero, from Buenos Aires, as of November 2014 Knowledge Firstly, let’s look at Knowledge. Knowledge is what we know. Think of this as the map of the World we build inside our brains. Like a physical map, it helps us know where things are – but it contains more than that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc. Not bad! It is from this “map” that we base our decisions, not the real world itself. Our brains constantly update this map from the signals coming through our eyes, ears, nose, mouth and skin. And coming through our viscera also! You can’t currently store knowledge in anything other than a brain, because a brain connects it all together. Everything is inter-connected in the brain. Computers are not artificial brains. They don’t understand what they are processing, and can’t make independent decisions based upon what you tell them. I would dare to say not yet! There are two sources that the brain uses to build this knowledge - information and data. Data Data is/are the facts of the World. For example, take yourself. You may be 5ft tall, have brown hair and blue eyes. All of this is “data”. You have brown hair whether this is written down somewhere or not. OK, it is a fact (see fact definition below) but the explanation is incorrect!
  • 174.
    fact noun noun: fact; pluralnoun: facts 1. a thing that is known or proved to be true. "the most commonly known fact about hedgehogs is that they have fleas" synonyms: reality, actuality, certainty, factuality, certitude; More truth, naked truth, verity, gospel "it is a fact that the water supply is seriously polluted" antonyms: lie, fiction o information used as evidence or as part of a report or news article. "even the most inventive journalism peters out without facts, and in this case there were no facts" synonyms: detail, piece of information, particular, item, specific, element, point, factor, feature, characteristic, respect, ingredient, attribute, circumstance, consideration, aspect, facet; More information, itemized information, whole story; informalinfo, gen, low-down, score, dope "every fact in the report was double-checked" o used to refer to a particular situation under discussion. noun: the fact that "despite the fact that I'm so tired, sleep is elusive" In many ways, data can be thought of as a description of the World. We can perceive this data with our senses, and then the brain can process this. Data would have the same nature of knowledge, however not enough to be considered knowledge. We prefer to define it as “a piece of knowledge” instead. What really happens is that knowledge, accepted as “things we know” would be a thing hierarchically structured: low level knowledge would be data for a superior cognitive level. Human beings have used data as long as we’ve existed to form knowledge of the world. That’s correct! Until we started using information, all we could use was data directly. If you wanted to know how tall I was, you would have to come and look at me. Our knowledge was limited by our direct experiences. What was seen up to here is enough to go thru life and make our knowledge increase but not enough to build an ontology that help us to “see more and better” superior levels of it. It reminds us the sagas about the cosmovision differences between Newton’s and Einstein’s.
  • 175.
    Information Information allows usto expand our knowledge beyond the range of our senses. Not bad! We can capture data in information, then move it about so that other people can access it at different times. Here is a simple analogy for you. If I take a picture of you, the photograph is information. But what you look like is data. That’s incorrect! Information are “pieces of knowledge” that improve our level of appreciation (diminishing our uncertainty) of the real world, as per Shannon Information Theory!. I can move the photo of you around, send it to other people via e-mail etc. However, I’m not actually moving you around – or what you look like. I’m simply allowing other people who can’t directly see you from where they are to know what you look like. If I lose or destroy the photo, this doesn’t change how you look. So, in the case of the lost tax records, the CDs were information. The information was lost, but the data wasn’t. Mrs Jones still lives at 14 Whitewater road, and she was still born on 15th August 1971. The Infogineering Model (below) explains how these interact… As Einstein Shannon discovery was too advanced for its time; Information is not easy to understand, it is something that help intelligent beings to take decisions, in order to survive, to improve, to solve a problem, to avoid obstacles: ideally is a “warning signal” (a sort of biiiiip…) accompanied by a message (minimally a “bit”, a zero or a one). => Back
  • 176.
    The Web forfun Darwin Semantic Pill unveiling example Who’s on first: Polisemy versus Monosemy From Buenos Aires as of April 9th 18:25 2015 Agent-Human: juan chamero, Time elapsed: 20 minutes o Who’s on first: example of a famous EN American comic routine (worldwide: vaudeville), http://en.wikipedia.org/wiki/Who%27s_on_First%3F o ES Spanish cultural examples, http://www.retoricas.com/2011/07/ejemplos-de- equivoco.html o semiotics, http://en.wikipedia.org/wiki/Semiotics o polisemy, http://grammar.about.com/od/pq/g/polysemyterm.htm o Abbott-Costello routine: http://www.psu.edu/dept/inart10_110/inart10/whos.html, According to some estimates, more than 40% of English words have more than one meaning. The fact that so many words (or lexemes) are polysemous "shows that semantic changes often add meanings to the language without subtracting any" (M. Lynne Murphy, Lexical Meaning, 2010). Abbot Costello misunderstandings routine (In Spanish “rutina de equívocos”) Abbott: Strange as it may seem, they give ball players nowadays very peculiar names. Costello: Funny names? Abbott: Nicknames, nicknames. Now, on the St. Louis team we have Who's on first, What's on second, I Don't Know is on third-- Costello: That's what I want to find out. I want you to tell me the names of the fellows on the St. Louis team. Abbott: I'm telling you. Who's on first, What's on second, I Don't Know is on third--
  • 177.
    Costello: You knowthe fellows' names? Abbott: Yes. Costello: Well, then who's playing first? Abbott: Yes. Costello: I mean the fellow's name on first base. Abbott: Who. Costello: The fellow playin' first base. Abbott: Who. Costello: The guy on first base. Abbott: Who is on first. Costello: Well, what are you askin' me for? Abbott: I'm not asking you--I'm telling you. Who is on first. Costello: I'm asking you--who's on first? Abbott: That's the man's name. Costello: That's who's name? Abbott: Yes. ~ ~ ~ ~ ~ Costello: When you pay off the first baseman every month, who gets the money? Abbott: Every dollar of it. And why not, the man's entitled to it. Costello: Who is? Abbott: Yes. Costello: So who gets it? Abbott: Why shouldn't he? Sometimes his wife comes down and collects it. Costello: Who's wife? Abbott: Yes. After all, the man earns it. Costello: Who does? Abbott: Absolutely. Costello: Well, all I'm trying to find out is what's the guy's name on first base? Abbott: Oh, no, no. What is on second base. Costello: I'm not asking you who's on second. Abbott: Who's on first! ~ ~ ~ ~ ~ Costello: St. Louis has a good outfield? Abbott: Oh, absolutely. Costello: The left fielder's name? Abbott: Why. Costello: I don't know, I just thought I'd ask. Abbott: Well, I just thought I'd tell you. Costello: Then tell me who's playing left field? Abbott: Who's playing first. Costello: Stay out of the infield! The left fielder's name? Abbott: Why. Costello: Because. Abbott: Oh, he's center field. Costello: Wait a minute. You got a pitcher on this team? Abbott: Wouldn't this be a fine team w i t h o u t a pitcher? Costello: Tell me the pitcher's name. Abbott: Tomorrow. ~ ~ ~ ~ ~ Costello: Now, when the guy at bat bunts the ball--me being a good catcher--I want to throw the guy out at first base, so I pick up the ball and throw it to who? Abbott: Now, that's he first thing you've said right. Costello: I DON'T EVEN KNOW WHAT I'M TALKING ABOUT! Abbott: Don't get excited. Take it easy. Costello: I throw the ball to first base, whoever it is grabs the ball, so the guy runs to second. Who picks up the ball and throws it to what. What throws it to I don't know. I don't know throws it back to tomorrow--a triple play. Abbott: Yeah, it could be. Costello: Another guy gets up and it's a long ball to center. Abbott: Because. Costello: Why? I don't know. And I don't care. Abbott: What was that? Costello: I said, I DON'T CARE! Abbott: Oh, that's our shortstop! => Back
  • 178.
    Darwin Q&A By JuanChamero, from Buenos Aires, as of November 11th 2014 Some Web Experts Crucial Questioning 1. Differentiators re: Darwin vs. Google (or other similar search engines) 2. 3-5 use cases for the Darwin tools/ platform (not just for unveiling the web structure – more specific to a business or function, or something a government agency could do with it) 3. Any brief discourse you have on the ability of Darwin to work inside and discover items in the Dark (Hidden) Web. My answer starts here Juan Chamero Note 01: What a hard and deep questioning! Notwithstanding I will try to answer talking to myself. First of all risking being redundant the essence of Darwin methodology has much of Zen (I’m a Zen master), in our Western culture meaning open mind, maximum awareness to everything hidden or unhidden, taking into consideration the whole past and all possible hypothesis about future but fully aware and living the present. Darwin was conceived like a mental tool to “see” things as they are just NOW, without time (perhaps a time that could neither be kept nor added). This essence is very important because it works like Meta Ontology behind all Darwin unveiling tasks. Darwin put emphasis in unveiling “global and/or massive trends”, looking for the best truths/beliefs about them instead of inspecting individual behaviors within them. What does it mean in terms of IT and computing? Darwin agents will inspect individuals anonymously almost without even registering their data but paying special attention to signs of “suspected” traits. Note 02: Here I make a first stop: many times in the beginning of Darwin applications technical people ask us about coincidences with some other unveiling tools like f.i. “data mining”. Data mining backs up on probabilities as Darwin does but without “suspecting”. On the contrary Darwin is a “causal” methodology with a model of suspected behavior within a context of agents and scripts. Now “You the Web expert” may question my note above saying: let me know Juan why are you prejudging individual behaviors with suspects? You and I are right in some extent: in the beginning of a Darwin exploration there are not real suspects, at most only some “seeds of suspected behaviors”. As the long exploration proceeds, thousand of Web pages are explored for each suspected item, even starting at random without those mentioned seeds, in order to be scientifically framed, of a sudden appear persistent irregularities, atypical minor behaviors that
  • 179.
    once detected byDarwin agents and/or approved by the human that operates Darwin become the base of an autonomous and auto learning suspected behaviors database. As Darwin is a semantic methodology those irregularities are somehow authenticated via semantic coherence. In our glossary the “creatures” to be unveiled are “concepts”, “in mind ideas” shared by millions of people pertaining to a given pair culture - language. As explained in our presentations, papers and e-books for the pair English - American we estimate circa 20 million different and unique “in mind ideas”. In order to use Darwin ideally for any application we should have at hand a map of all them, at present a typical chicken and egg problem, so Darwin is enforced to build on the run provisional, circumstantial thesauruses. Questioning 3: Dark (Hidden) Web First Drill: Let’s suppose the challenge be to unveil a criminal global and massive behavior starting at “zero ground” in terms of information and knowledge. The subject to unveil (via Google search with and without quotation marks) is related to a “suspected” original “in mind idea”: “criminal behavior” as its suspected “modal name” for the pair English American. criminal behavior, 15,100,000 Ref “criminal behavior”, 2,630,000 Ref That’s not bad beginning! We have at least one million documents somehow dealing with this suspected in mind creature. The first step following our methodology would be a semantic exploration (guided by experts) “around” this semantic core trying to build a first approach to a “Criminal Behavior” thesaurus. Next we should start the real Darwin scouting Web task. Second Drill: Another example, more controversial: “terrorism”, 95,100,000 Ref “global terrorism”, 475,000 Ref, not too much taking into account its real magnitude! “terrorism map”, 5,400 Ref, not too much, only a few! "terrorism thesaurus", 49! Ref, perfectly in the darkness! We have arrived to see how we humans hide what we considerer morally and ethically malevolent: via euphemisms!. So if we search by euphemisms for terrorism, 95,900 Ref terrorism slangs, 90,300,000 Ref At first sight I, as a human, would tend to guide our agents to explore this euphemistic road!
  • 180.
    Questioning 2: Usecases 1. As a Global Intelligent Interface to see the Web more and better via ANY CONVENTIONAL SEARCH ENGINE or a pool of them via a Web Thesaurus; 2. To make trustable, non intrusive and massive surveys and polls about any existent “in mind” subject; 3. To maintain a Human Total Memory and Knowledge Database, for instance in the LOC, Library of Congress; 4. To build IR’s, trustable, non intrusive Intelligent Reports about any subject within the Web and all World Data Reservoirs (within the publicly open as well as within the “dark” ones); 5. To build IP, Intelligent Portals, structured under Web 3.0 and Web 4.0 interactive models via “concepts” instead of “words” like actually. Darwin portals may learn fast and easily to talk precisely with any type of users, adding a new dimension to human communications; 6. To create automatically the best possible metadata of large not structured to semi structured large databases, at large to make them semantic. Questioning 1: Darwin vs. CSE’s, Conventional Search Engines In order to unveil information and intelligence from the Web Darwin needs to accede to it thru an index. Most CSE’s have the Web indexed by “words” no matter the language. Some of them like Google have all Web pages indexed by, or as if they were indexed by, long chains of words. Some of them are considered exhaustive and updated to the second. That is the raw material Darwin needs to optimize the search. With this use Darwin could be considered a CSE Optimizer that guides users semantically, to obtain what they need in terms of information and knowledge whether possible in Only One Click. Notwithstanding Darwin may be alternatively used to index the Web documents by concepts from the beginning, at the moment they are uploaded. However this is another project that implies to build something that replace CSE’s, at the moment a huge, costly and risky task. This reasoning arise the following question: is it necessary the indexing by words? Yes it is, namely a first basic and necessary index that should be followed by indexing by concepts. The Darwin difference: What Darwin adds to any CSE like Google is a “semantic road map” to make an efficient search. The “conventional user” of a CSE potentiated by this interface will query the Web by a word or chain of words that he/she intuits will guide him/her to issue a reasonable question. As hypothetically Darwin knows the Web semantically, circa 20 million of unique and different ideas in all possible languages, it will warn users all possible regions/domains of knowledge where those intuited words have existence, let’s say in 5 branches of knowledge under 30 main subjects.
  • 181.
    Once the usermakes the choice that considers more adequate Darwin sends a precise “semantic road map” to the query box. This way CSE’s instead of rendering hundreds of thousands of references will show only a small and very specific set of references closely related to the “suspected” user “in mind idea”. Note 003: following the evolutionary series [data => information => knowledge => wisdom] propelled via intelligence we are entering into The Knowledge Era thru more than 8,000 years of a textual culture. We were conscientious that some images carry more information, knowledge and even wisdom than thousands of words but we never imagined that we could process them meaningfully and sometimes better than text. Darwin ads this ability to all its computing steps and even more, it is trying to detect and unveil “gestures”. => Back
  • 182.
    “El Leñador deKentucky” (Abraham Lincoln) Semantic Exploration for fun applying a Darwin Ontology interface via Google By Juan Chamero, a casual “in mind image” transmitted along a friendly chat somewhere having a cup of coffee on 1st of May 2014 First trial: “El Leñador de Kentucky”, 25, all meaningful; Perhaps pointing to a similar image: “El Honrado Abraham”, 45, all meaningful again; Let’s try now to find word core equivalence in English, Lincoln native’s language: lumber timber lumberjack logger treefeller woodcutter woodchopper woodsman (No woodman!) {the}{a} {} XYZ of Kentucky Kentucky XYZ Kentucky’s XYZ XYZ: woodsman Second trial: “The Woodsman of Kentucky”, gives 0! “A Woodsman of Kentucky”, gives 1! “Woodsman of Kentucky”, 138,000 many of them meaningful instead! However most of them pointing to “Daniel Boone” another legend! However browsing and wandering a little by the open query (without quotation marks) woodsman Lincoln Kentucky we found the expression: backwoodsman that probably fits better to the “in mind image” many of us have about Abraham Lincoln.
  • 183.
    Third trial: “Backwoodsman ofKentucky”, 39 and a little confused; “Kentucky backwoodsman”, 5,530, and now from here seen at the distance we get a reasonable good sample of the “époque”, Lincoln and Boone. Fourth trial: As a byproduct of our semantic exploration we may take a glance over the superb semantic talent of Lincoln. See this news titled “A Kentucky Backwoodsman who became our President” published by St. Petersburg Times - Feb 12, 1969. From here you may go to see his famous Gettysburg Address a 135 second speech considered one of the greatest speeches of the world. We reproduce down here its most trustable version (it is a Bliss Copy (Gettysburg speech), presented by John G. Nicolay his Personal Secretary): Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth. Abraham Lincoln November 19, 1863 List of suspected concepts c1: [a new nation], [conceived in Liberty], and dedicated to the proposition that [all men are created equal]; c2: civil war, c3: testing whether any nation so conceived and so dedicated, can long endure; c4: We are met on a great battle-field; c5: [We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live]. [It is altogether fitting and proper that we should do this];
  • 184.
    c6: [we cannot dedicate] --[ we can not consecrate] -- [we can not hallow]; c7 The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract; Google renders 42.000 references for this expression as it is exactly! c8: The world will little note, nor long remember what we say here, but it can never forget what they did here; c9: It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced; c10: it is a reinforced compromise; c11: we here highly resolve that these dead shall not have died in vain; c12: this nation, under God, shall have a new birth of freedom; c13: government of the people, by the people, for the people, shall not perish from the earth; => Back
  • 185.
    Human Knowledge Disciplines Firstbrief exploration as a test of availability By Juan Chamero, from Buenos Aires, as of November 11th 2014  http://en.wikipedia.org/wiki/List_of_academic_disciplines_and_sub-disciplines#cite_note- 1  http://www.basicknowledge101.com/index.html  http://www.dmoz.org/  NATO A-Z Thesaurus, http://www.nato.int/cps/en/natolive/topics.htm  Techniques for mapping thematically, http://thematicmapping.org/techniques/  OGC-KML, http://www.opengeospatial.org/standards/kml/  AoK, ToK, https://ibpublishing.ibo.org/exist/rest/app/tsm.xql?doc=d_0_tok_gui_1304_1_e&part=2& chapter=4: It states eight AoK, namely: o mathematics o natural sciences o human sciences o history o the arts o ethics o religious knowledge systems o indigenous knowledge systems.  List of 100 Universal Themes, http://www.mychandlerschools.org/cms/lib6/AZ01001175/Centricity/Domain/963/univer sal%20themes.pdf  Another list, http://www.docstoc.com/docs/25450957/List-of-Universal-Themes  HOT LIST, Wikipedia, ~10,000 encyclopedic topics, http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Missing_encyclopedic_articles/Hot/T 2  Wikipedia content index, http://en.wikipedia.org/wiki/Portal:Contents/Overviews  Wikipedia actual content, A-Z, http://en.wikipedia.org/wiki/Portal:Contents/A%E2%80%93Z_index  some ideas starting from zero knowledge: a) from 35,000,000,000 URL’s we may choose 200,000,000 in English and at random somehow conforming 200 differentiable hypothetically homogeneous thematically clusters sizing in the average 1,000,000 URL’s;
  • 186.
     Career List1 (Glasgow University:  Accounting & Finance  Aeronautical & Manufacturing Engineering  Agriculture & Forestry  American Studies  Anatomy & Physiology  Anthropology  Archaeology  Architecture  Art & Design  Biological Sciences  Building  Business and Management Studies  Celtic Studies  Chemical Engineering  Chemistry  Civil Engineering  Classics & Ancient History  Communication & Media Studies  Computer Science  Dentistry  Drama, Dance & Cinematics  East & South Asian Studies  Economics  Education  Electrical & Electronic Engineering  English  Film-making  Food Science  French  General Engineering  Geography & Environmental Science  Geology  German  History  History of Art, Architecture & Design  Hospitality, Leisure, Recreation & Tourism  Iberian Languages  Italian  Journalism  Land & Property Management  Law  Librarianship & Information Management  Linguistics  Marketing  Materials Technology  Mathematics  Mechanical Engineering  Medicine  Middle Eastern & African Studies  Music  Nursing  Other Subjects Allied to Medicine
  • 187.
     Pharmacology &Pharmacy  Philosophy  Physics & Astronomy  Politics  Psychology  Russian & East European Languages  Social Policy  Social Work  Sociology  Sports Science  Theology & Religious Studies  Town & Country Planning and Landscape Design  Veterinary Medicine List of MINOR JOBS/CAREERS http://www.alec.co.uk/free-career-assessment/list-of-careers.htm Administration Jobs:  Accounting Officers  Administrative Assistants  Administrative Support Worker Supervisors and Managers  Auditing Officers  Bookkeepers  Cashiers  Computer Operators  Couriers  Credit Authorisers and Officers  Customer Service Representatives  Data Entry Personnel  Data Processing Officers and Assistants  Database Administrators  Debt Collectors  Dispatchers  Filing Assistants  Financial Officers  Hotel Receptionists  Human Resources Assistants  Information Officers  Interviewers  Invoicing Officers  Librarians  Library Assistants  Messengers  Meter Readers  Office Clerks  Office Supervisors and Managers  Order Clerks  Payroll Clerks  Postal Room Staff  Postal Service Workers  Procurement Officers  Production and Distribution Officers  Production and Planning Officers
  • 188.
     Receptionists  RecordClerks  Reservation and Transportation Ticket Agents  Secretaries  Transporting and Receiving Officers  Stock Control and Order Fillers  Travel Agents Agricultural Jobs list of careers:  Agricultural Managers  Agricultural workers  Animal Husbandry workers  Conservation workers  Farm managers  Farmers  Fishermen  Forestry Workers  Trawler Operators Finance Jobs:  Accountant  Actuaries  Auditors  Budget Analysts  Cashiers  Debt Counsellors  Economists  Insurance Sales Agents  Insurance Underwriters  Loan Officers  Personal Financial Advisors  Tax Inspectors, Collectors and Revenue Agents Construction Jobs:  Block Tile Pavers  Boilermakers  Carpenters  Carpet, Floor, and Tile Fitters  Ceiling Tile Installers  Concrete Finishers  Construction and Building Inspectors  Construction Equipment Operators  Construction Managers  Drywall Installers  Electricians  Glaziers  Hazardous Materials Removal Workers  Insulation Workers  Lift Installers and Repairers  Painters and Decorators  Pipelayers and Plumbers  Plasterers Masons  Roofers  Sheet Metal Workers
  • 189.
     Site Labourers Stonemasons  Structural Iron and Metal Workers Creative Jobs:  Actors  Announcers  Artists  Camera Operators and Editors  Choreographers  Craftspeople  Dancers  Designers  Desktop Publishers  Graphic Designers  Interior Designers  Musicians  Photographers  Producers  Singers  Website Developers and Designers  Writers and Editors Education and Teaching Jobs list of careers:  Computer Trainers  Education Administrators  Home Tutors  Pre-school Teachers  Special Education Teachers  Teachers - Community and Adult Education  Teachers - Primary and Middle  Teachers - Secondary and Upper Level  Teaching Assistants  Training Specialists and Managers  University and College Lecturers Healthcare and Health Related Jobs:  Anaesthetists  Chiropractors  Counsellors  Dental Hygienists  Dental Laboratory Technicians  Dentists  Dieticians  Health Services Managers  Home Healthcare Assistants  Language Pathologists  Medical Assistants  Medical records specialist careers  Medical Scientists  Medical Services Managers
  • 190.
     Mental HealthWorkers  Midwives  Nurses  Nursing Assistants  Nutritionists  Occupational Health and Safety Specialists and Technicians  Occupational Therapist Assistants  Occupational Therapists  Ophthalmic Laboratory Technicians  Opticians, Dispensing  Optometrists  Paramedics  Pharmacists  Pharmacy Assistants  Pharmacy Technicians  Physical Therapist Assistants  Physical Therapists  Physician Assistants  Physicians  Psychiatric Assistants  Psychologists and Psychiatrists  Recreational Therapists  Registered Nurses  Respiratory Therapists  Social Service Assistants  Social Workers  Surgeons Medical Sciences Jobs:  Audiologists  Biomedical Engineers  Cardiovascular Technologists and Technicians  Diagnostic Medical Sonographers  Emergency Medical Technicians  Health Information Technicians  Nuclear Medicine Technologists  Radiological Technologists and Technicians  Surgical Technologist IT and Telecommunications:  Computer Maintenance  Computer Programmers and Operators  Computer Scientists  Computer Software Engineers  Information Systems Managers  Systems Analysts  Systems Developers  Telecommunications Equipment Installers and Repairers  User Support Personnel Management Jobs list of careers:  Administrative Services Managers  Buyers  Claims Adjusters, Appraisers, Examiners, and Investigators
  • 191.
     Community AssociationManagers  Computer Managers  Cost Estimators  Engineering Managers  Financial Analysts  Financial Managers  Food Service Managers  Funeral Directors  Health Services Managers  Human Resources Managers and Specialists  Industrial Production Managers  Information Systems Managers  Labour Relations Specialists and Managers  Management Analysts  Marketing Managers  Medical Services Managers  Natural Science Managers  Promotions Managers  Property Managers  Public Relations Managers  Purchasing Agents  Purchasing Managers  Retail Managers  Sales Managers  Senior Executives Manufacturing Jobs:  Aircraft and Avionics Equipment Mechanics and Service Technicians  Assemblers and Line Workers  Automobile Service Technicians and Mechanics  Boiler Operators  Bookbinders  Clothing Manufacturers  Diesel Service Technicians and Mechanics  Engine Mechanics  Fabricators  Food Processing Workers  Furnishing Careers  Heating, Air Conditioning, and Refrigeration Mechanics and Installers  Heavy Vehicle Service Technicians and Mechanics  Industrial Machinery Installation, Repair, and Maintenance Workers  Inspectors, Testers, Sorters, Samplers  Jewellers and Precious Stone and Metal Workers  Line Installers and Repairers  Machine Operators  Machine Setters and Operators  Machinists  Mobile Equipment Service Technicians and Mechanics  Painting and Coating Workers  Photographic Process Workers and Processing Machine Operators  Power Plant Operators, Distributors, and Dispatchers  Precision Instrument and Equipment Production  Pre-press Technicians and Workers  Printing Machine Operators  Radio Equipment Manufacture and Installation
  • 192.
     Semiconductor Processors Stationary Engineers  Textile Careers  Tool and Die Makers  Water and Liquid Waste Treatment Plant and System Operators  Welding, Soldering, and Brazing Workers  Woodworkers Professional:  Archivists  Clergy  Coaches  Correctional Treatment Specialists  Correspondents  Court Reporters  Curators  Directors  Instructional Co-ordinators  Interpreters  Judges, Magistrates, and Other Judicial Workers  Lawyers  Legal Assistants  Library Technicians  Market Researchers  News Analysts  Operations Research Analysts  Probation Officers  Reporters  Social Scientists  Statisticians  Translators  Veterinary Surgeons  Veterinary Technicians Repair and Maintenance Jobs list of careers:  Automobile Body and Related Repairers  Electrical and Electronics Installers and Repairers  Electronic Home Entertainment Installers and Repairers  General Maintenance and Repair Workers  Home Appliance Repairers  Office Machine Repair Sales, Marketing and Related Jobs:  Advertising Managers  Estate Agents  Marketing Managers  Product Promoters  Promotions Managers  Public Relations Specialists  Retail Salespersons  Sales Engineers  Sales Representatives  Sales Team Managers  Travel Agents
  • 193.
    Service Related Jobs: Barbers  Beauty Therapists  Building Cleaning Workers  Catering Workers  Chefs, Cooks, and Kitchen Workers  Childcare Workers  Correctional Officers  Dental Assistants  Firemen  Fitness Workers  Flight Attendants  Grounds Maintenance Workers  Investigators  Personal Aides  Personal Appearance Workers  Pest Control Workers  Police and Detectives  Private Detectives  Recreation Workers  Security Guards Technical:  Aerospace Engineers  Agricultural Engineers  Agricultural Scientists  Architects  Astronomers  Atmospheric Scientists  Biological Scientists  Broadcast Engineering Technicians  Cartographers  Chemical Engineers  Chemists and Materials Scientists  Civil Engineers  Clinical Laboratory Technologists and Technicians  Computer Hardware Engineers  Conservation Scientists  Drafters  Electrical and Electronics Engineers  Engineering Technicians  Engineers  Environmental Engineers  Environmental Scientists  Food Scientists  Geological Engineers  Geo-scientists  Health and Safety Engineers  Industrial Engineers  Landscape Architects  Materials Engineers  Mathematicians  Mechanical Engineers
  • 194.
     Mining Engineers Mining Safety Engineers  Museum Technicians  Nuclear Engineers  Petroleum Engineers  Physicists  Radio Operators  Science Technicians  Sound Engineering Technicians  Surveyors and Surveying Technicians  Systems Analysts  Town Planners Transport list of careers:  Air Traffic Controllers  Aircraft Pilots  Bus Drivers  Flight Attendants  Flight Engineers  Removals Occupations  Rail Transport Occupations  Taxi Drivers and Chauffeurs  Truck Drivers and Delivery Workers  Water Transport Occupations Stanford University Syllabus
  • 197.
  • 198.
    Word versus Concept Exampleof a Darwin Team Semantic Workshop Discussion Held at Barcelona, Buenos Aires, Dallas, as of February 2015 By Juan Chamero, from Buenos Aires as of 27 th April 2015 Subject of discussion: What’s in a word? As per Google: 470,000,000 references from Buenos Aires (IP) as of Feb 2015 Word Source 1: http://en.wikipedia.org/wiki/Word In linguistics, a word is the smallest element that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning). This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own. A word may consist of a single morpheme (for example: oh!, rock, red, quick, run, expect), or several (rocks, redness, quickly, running, unexpected), whereas a morpheme may not be able to stand on its own as a word (in the words just mentioned, these are -s, - ness, -ly, -ing, un-, -ed). A complex word will typically include a root and one or more affixes (rock-s, red-ness, quick-ly, run-ning, un-expect-ed), or more than one root in a compound (black-board, rat-race). Words can be put together to build larger elements of language, such as phrases (a red rock), clauses (I threw a rock), and sentences (He threw a rock too but he missed). The term word may refer to a spoken word or to a written word, or sometimes to the abstract concept behind either. Spoken words are made up of units of sound called phonemes, and written words of symbols called graphemes, such as the letters of the English alphabet. Semantic definition [edit] Leonard Bloomfield introduced the concept of "Minimal Free Forms" in 1926. Words are thought of as the smallest meaningful unit of speech that can stand by themselves.[1] This correlates phonemes (units of sound) to lexemes (units of meaning). However, some written words are not minimal free forms as they make no sense by themselves (for example, the and of).[2] Some semanticists have put forward a theory of so-called semantic primitives or semantic primes, indefinable words representing fundamental concepts that are intuitively meaningful. According to this theory, semantic primes serve as the basis for describing the meaning, without circularity, of other words and their associated conceptual denotations.[3]
  • 199.
    Source 2: http://www.thefreedictionary.com/word Onthis page Thesaurus Translations Word Browser Advertiseme nt (Bad banner? Please let us know Remov e Ads Share: Cite / link: word (wûrd) n. 1. A sound or a combination of sounds, or its representation in writing or printing, that symbo lizes andcommunicates a meaning and may consist of a single morpheme or of a combination of morphemes. 2. Something said; an utterance, remark, or comment: May I say a word about that? 3. Computer Science A set of bits constituting the smallest unit of addressable memory. 4. words Discourse or talk; speech: Actions speak louder than words. 5. words Music The text of a vocal composition; lyrics. 6. An assurance or promise; sworn intention: She has kept her word. 7. a. A command or direction; an order: gave the word to retreat. b. A verbal signal; a password or watchword. 8. a. News: Any word on your promotion? See Synonyms at news. b. Rumor: Word has it they're divorcing. 9. words Hostile or angry remarks made back and forth. 10. Used euphemistically in combination with the initial letter of a term that is considered off ensive ortaboo or that one does not want to utter: "Although economists here will not call it a recession yet,the dreaded 'R' word is beginning to pop up in the media" (Francine S. Kiefer). 11. Word a. See Logos. b. The Scriptures; the Bible. Concept Source 1: http://en.wikipedia.org/wiki/Concept A concept is an abstraction or generalization from experience or the result of a transformation of existing concepts. The concept reifies all of its actual or potential instances whether these are things in the real world or other ideas. Concepts are treated in many if not most disciplines whether explicitly such as in psychology, philosophy, etc. or implicitly such as in mathematics, physics, etc.
  • 200.
    When the mindmakes a generalization such as the concept of tree, it extracts similarities from numerous examples; the simplification enables higher-level thinking. In metaphysics, and especially ontology, a concept is a fundamental category of existence. In contemporary philosophy, there are at least three prevailing ways to understand what a concept is:[1][See talk page]  Concepts as mental representations, where concepts are entities that exist in the brain.  Concepts as abilities, where concepts are abilities peculiar to cognitive agents.  Concepts as abstract objects, where objects are the constituents of propositions that mediate between thought, language, and referents. Note 01: Darwin differentiates the three ways. Mental representations Main article: Mental representation In a physical theory of mind, a concept is a mental representation, which the brain uses to denote a class of things in the world. This is to say that it is literally, a symbol or group of symbols together made from the physical material of the brain.[7][8] Concepts are mental representations that allow us to draw appropriate inferences about the type of entities we encounter in our everyday lives.[8] Concepts do not encompass all mental representations, but are merely a subset of them.[7] The use of concepts is necessary to cognitive processes such as categorization, memory, decision, learning, and inference.[citation needed] Source 2: http://en.wikipedia.org/wiki/Problem_of_universals Note 02: Please read it carefully for our next discussion because this subject is closely related to a recent Darwin essay about the quantum nature of the present duality (nothing and everything) and the Web as a universe of “avatars”. Source 3: juan chamero, Darwin Architect Possible names structures that may point to concepts as per Darwin specific “in mind” ideas: o [w], only a small percentage of those *w’s+ are concepts o [w w], only an even smaller percentage of those *w w’s+ are concepts o [w w w], only an even smaller percentage of those *w w w’s+ are concepts o [w w w w], only an even smaller percentage of those *w w w w’s+ are concepts o ……………. *w w’s+ examples: *it is+, *how cold+, *never mind+, …… [w w] examples: [parallel processing], [visual arts], [performing arts+, …
  • 201.
    Concepts [ ]are hierarchically organized, tending to structure as “Logical Trees” most *w w w …w’s+ are nonsensical chains, for example: [how how fine are you], [as_it_is now], [that is to say namely], [up up up up for ever], [me too] and many may point to agreed coded messages or portions of coded messages as well… Word Universe The number of words in the English language is: 1,025,109.8. This is the estimate by the Global Language Monitor on January 1, 2014. The English Language passed the million Word thresholds on June 10, 2009 at 10:22 a.m. (GMT). The Millionth Word was the controversial 'Web 2.0′. So you may imagine the amount of false and pseudo concepts *w w w …+ On the contrary, “in mind” idea is a piece of knowledge, something unique, specific, that deserve to be documented, to be explained meaningfully via common languages; Source 4: http://www.hutong-school.com/how-many-chinese-characters-are-there The Chinese Characters and their Numbers China has always been “larger than life” when it comes to numbers and quantities. After all, it has the largest population out of any other country. It is no wonder then, that its language has become just as extensive! But exactly how big is it? The Chart of Generally Utilized Characters of Modern Chinese defines the existence of 7000 characters! If you think this number is high, you'll be shocked to hear that according to the Great Compendium of Chinese Characters or “Hanyu Da Zidian” (汉语大字典; H{nyǔ d{ zìdiǎn), the number of existing characters is actually 54,678! But if you’re the kind of person that loves a challenge then there's the Dictionary of Chinese Variant Form (中华字海; Zhōnghu| zì hǎi). This work, also called the “Yìtǐzì zìdiǎn” (异体字字典), contains definitions for 106,230 Chinese characters! But luckily, there’s no need to be scared. Another document called the Chart of Common Characters of Modern Chinese only includes 3500 characters -- that's half the amount included in the first chart. To make things easier, you probably won’t even need 1,000 of them, since they are considered less commonly used characters. If you learn Chinese and take the official Chinese language test called the H{nyǔ Shuǐpíng Kǎoshì 汉语水平考试 (also known as the HSK), you will only need to show knowledge of 2,600 characters to pass the exam at the highest level. And if this is not enough, check out these interesting facts: with 2,500 characters you can read 97.97 % of everyday written language and with 3,500 characters you can read up to 99.48 %, which means pretty much everything. It’s even more comforting to know that with only 900 characters, you can actually read 90% of a newspaper! Note 03: Similar considerations may be applied to any “pair “Language – Culture”. In fact a glossary of the most common 1,000 terms is good enough. => Back
  • 202.
    Towards a MathematicsSemantic Seed Buildup Dr Eduardo Ortiz and Co-Workers First draft Juan Chamero review on June 22 nd 2009, from Caece University, Buenos Aires, Argentina Last review on October 20th 2011 Note: Dr. Eduardo Ortiz is an Imperial College of London (Emeritus) Professor of Mathematics and History of Mathematics. The Eduardo Ortiz and Co-workers’ “semantic seed” on mathematics (see below) is an opening of 52 subjects. All of them have been semantically checked, both by human and by Darwin agents with the following result:  All names are “modals” respect to Google, namely they are the statistically best suited “keywords” to point to their represented “concepts”. It means that “Dr Ortiz and Co-workers” selected the right terms in the right sequence, genre, spelling, writing, number mode singular or plural. Agents select the best suited among hundreds of potentially similar keywords. Ortiz and Co-workers’ names matched 100% modals retrieved via agents. By the way this discipline probed to be extremely sensitive to small changes, namely: “operator theory”: 493.000 versus “operators theory”: 11.700 ;  Top references (100 per each query) called by those modal “keywords” proved to be strongly authoritative. Our idea is to provide the Authorities URL´s set retrieved from the whole Top references raw data as the initial “Authoritative seed” to guide Darwin agents scouting. From our experience authorities stack together on top as an almost relatively durable and solid block;  The seed seems to be complete, good enough to generate the whole Mathematics Logical Tree without “semantic holes”, disciplines, sub-disciplines or subjects missing or poorly covered. What we say a reasonable good “semantic umbrella”. It touches (the seed) some other related disciplines in a proportion that is continually computed by agents: However some additional work has to be performed in order to use this seed that looks semantically “flat”. Let´s play a sort of imagery about it. If “mathematics” is the “root” of the MLT, Mathematical Logical Tree, we may imagine up to 8 levels of opening (some disciplines ampler but more ambiguous than mathematics, like for example ART, have up to 13 levels) going down from root to “leaves”. On the contrary some semantically well known disciplines like “Computing” are represented by LT, Logical Trees of no more than six levels. With an average opening of 5 for each “node” of a fancy LT of six levels we would have [1 – 5 – 25 – 125 – 625 – 3125] as the nodes by level sequence totaling a 3906 nodes-subjects LT. Please disregard this number and take it only as gaming a little with figures to improve our graphic image of semantic trees. From our experience and “first impression” we program the limits and boundaries of our Darwin agents’ exploration program for Mathematics: expecting to unveil from 2.000 to 6.000 subjects distributed from root to leaves thru 5 to 8 levels. Taking a look to the Ortiz seed we appreciate the same suspected anomaly he and his Co-workers pointed out (written in colloquial Spanish). “-------Fíjate que hay dos grupos grandes (100-200) y (1000-5000), tres muy grandes (Statistics, 6.400); (Numerical Analysis 4.400) y dos enormes: Computer Science (13000) y Education (12.000) que no hacen juego con los 17000 que da para toda la matematica. Las partes mas abstractas e importantes de la Mat. no necesariamente tienen muchas citas…..”
  • 203.
    As a reviewand consistency check we performed the same search using the same original seed terms. Complementary to the result depicted above we confirm the suspected behavior and just reading on Top references as a human acquainted with engineering, physics and mathematics terminologies we detect the following list of terms that probably deserve being in the upper level of the seed, above the initially unique level (some of them present) Mathematics: 144.000.000 Counting: 61.000.000 Geometry: 43.100.000 Algebra: 31.400.000 Arithmetic: 18.900.000 Calculus: 17.100.000 Topology: 12.800.000 Mathematical models (within applied mathematics): 10.100.000 Applied mathematics: 9.750.000 Combinatorial: 4.930.000 Combinatorics: 2.090.000 Discrete mathematics: 2.080.000 Mathematical Logic: 1.510.000 Physics mathematics: 1.220.000 Theorem proving: 1.190.000 Cryptanalysis: 1.150.000 Pure mathematics: 954.000 Mathematical Biology: 928.000 Mathematics of Computing: 494.000 Mathematical Language: 367.000 Financial Mathematics (within applied mathematics): 311.000 Mathematical Philosophy: 158.000 Mathematical algorithms: 144.000 Physical mathematics: 18.000 Virtual nodes When mapping complex and ambiguous contents as those of ART we realized of the need of “virtual nodes” that behave as virtual mothers of a bunch of subjects that share significant meaning, enough to be considered as “derived” or “sons”. And we say virtual because many times they do not have semantic existence yet. However most times these virtual nodes are existent by somehow ignored by the specialists because their implicit and trivial maternity. One example of this type is “geometry” mother of – and not at the same level that- “Convex and discrete geometry”, and “Differential geometry”, and perhaps we may group all apparently “derived” topologies from a virtual topology mother node. Similar grouping could be perhaps obtained by joining openings 37, 38, 39, 40, 41 and 42 in a “physical mathematics” type mother node And the still important opening (48) “Operations Research” could be opened in “Mathematical programming”, “Games theory”, “Economics” and “Social and behavioral sciences”. Math seed as per Eduardo Ortiz and Co-workers 1. “General Mathematics”: 5.310.000 History and biography => “History of Mathematics”: 713.000; “Biographies of Mathematicians”: 17.200; 2. “K-theory”: 260.000 => “K-theory” + mathematics: 177.000;
  • 204.
    3. “Group theoryand generalizations”: 21.700 => “Group Theory” 2.160.000; is it perhaps hidden significant derivations within GT concept?; 4. “Topological groups”, “Lie groups” => “Topological Groups”: 201.000; “Lie Groups” within Topological Groups: 65.200, high enough!;=> perhaps it justifies a semantic node derivation; 5. “Real functions” => “Real functions”: 263.000; Please check it, it looks like a suffix; 6. “Measure and integration”: 128.000 nice example of a “two basic words” keyword, namely defined with two “heavy” common words => measure: 183.000.000 AND integration: 148.000.000; 7. “Functions of a complex variable”: 195.000 perhaps a better (4 words) keyword example!; 8. “Potential theory”: 447.000 => shared with Physics; “Potential theory” + mathematics: 267.000; 9. “Several complex variables and analytic spaces”: 18.300 an interesting example that shows the existence of meaningful long chains of words keywords (6): 18.300 references, most of them meaningful! 10. “Special functions”: 1.330.000! => shared with Physics and other disciplines; “Special functions” + mathematics: 432.000; 11. “Ordinary differential equations”: 1.430.000! 12. “Partial differential equations”: 2.620.000! 13. “Dynamical systems and ergodic theory”: 24.000 => even though shared with Physics, it fundamentally belongs to the mathematics realm; 14. “Difference and functional equations”: 9.130 => rather small please check!. Is it perhaps a derivation?; 15. “Sequences, series, summability”: 3.360 => the same observation as above (14,..) please check it!; 16. “Approximations and expansions”: 15.600 => it is funny but Google and some other Search Engines present anomalies: please check “Approximations and expansions” + mathematics renders 16.900, a little higher (at June 22nd 2009); 17. “Fourier analysis”: 983.000 => shared with Physics “Fourier analysis” + mathematics: 438.000; 18. “Abstract harmonic analysis”: 46.600 19. “Integral transforms”: 273.000 20. “Operational calculus”: 163.000 => even though essentially a mathematics subject shared with physics and other disciplines like “applications”: “Operational calculus” + mathematics: 131.000 meanwhile “Operational calculus” + physics: 53.000; 21. “Integral equations”: 1.200.000
  • 205.
    22. “Functional analysis”:4.320.000! => When in a “semantic seed” appears a parental node with a significant number of references (as for example this and 10, 11, 12, and 21 openings) it would be convenient human go a little deep inside suggesting obliged derivations in order to guide Darwin agents better; 23. “Operator theory”: 673.000 => by the way isn´t this a derivation from above?(22); Please check it; 24. “Calculus of variations and optimal control optimization”: 6.900 => please check a potentially better keyword feasibility omitting optimization that renders 48.100 25. “Geometry”: 42.900.000 => isn´t too big?. Please check our upper level virtual nodes thesis to see if it fits up there; 26. “Convex and discrete geometry”: 19.300 27. “Differential geometry”: 1.290.000 28. “General topology”: 234.000 => with slight contacts with many other disciplines; however the semantic embedding mathematics => general topology will clear any possible ambiguity; 29. “Algebraic topology”: 817.000 30. “Manifolds and cell complexes”: 105.000 31. “Global analysis”: 1.720.000 => this keyword needs to be embedded within mathematics rendering 248.000 in order to eliminate ambiguity; 32. “Analysis on manifolds”: 274.000 => please check if this keyword is a common suffix of others; 33. “Probability theory and stochastic processes”: 97.300 => please check if either “probability”, or “probability theory” could become an upper virtual node as in (25); => “probability”: 66.700.000, “probability theory”: 1.740.000, “stochastic processes”: 2.140.000; 34. “Statistics”: 637.000.000 => if properly defined under mathematics it should be semantically embedded and in the upper level (see our comment about “seed upper level” and “virtual nodes”: 35. “Numerical analysis”: 3.830.000 36. “Computer science”: 77.000.000 => it seems too big to derivate directly from math. Please check following terms: Computer science mathematics: 1.570.000 but most authoritative references deal with deceptive pseudo chains such as “computer science”, “mathematics”,….physics,….; “Mathematics of Computer science”: renders 18.700 that seems to properly focusing in mathematics; 37. “Mechanics of particles and systems”: 16.800 => somehow shared with Physics?: “Mechanics of particles and systems” + mathematics: 11.800; 38. “Mechanics of solids”: 226.000 => somehow shared with Engineering/Physics?; “Mechanics of solids” + mathematics: 110.000!; 39. “Mechanics of deformable solids”: 30.000 => somehow shared with Engineering/Physics?: “Mechanics of deformable solids” + mathematics: 12.400;
  • 206.
    40. “Fluid mechanics”:4.930.000 => somehow shared with Engineering/Physics?: “Fluid mechanics” + mathematics: 1.640.000; 41. “Optics, electromagnetic theory”: 23.200 => somehow shared with Physics?: “Optics, electromagnetic theory” + mathematics: 9.140; 42. “Classical thermodynamics, heat transfer”: 1.160 (?) => probably too poor; please check with equivalent keywords; shared with Engineering/Physics; for example “thermodynamics, heat transfer” (comma could be omitted): 121.000; “thermodynamics, heat transfer” + mathematics: 28.700; 43. “Quantum theory”: 4.470.000 => shared with Physics; “Quantum theory” + mathematics: 1.150.000; 44. “Statistical mechanics, structure of matter”: 3.830 => shared with Physics; “Statistical mechanics, structure of matter” + mathematics: 3.820; even though backed up by too low figures it looks like a well focused keyword. Please observe that references with/without mathematics render pretty much the same as essentially belonging to the mathematics realm!; 45. “Relativity and gravitational theory”: 3.340 => shared with Physics;”Relativity and gravitational theory” + mathematics: 862; 46. “Astronomy and astrophysics”: 2.510.000 => shared with Astronomy and Astrophysics (within Physics); "Astronomy and astrophysics" + mathematics: 1.960.000; it looks like too akin to the mathematics realm; 47. “Geophysics”: 9.680.000 => A typical subject belonging to Physics realm; However its “math side” looks strong: “Geophysics” + mathematics renders 3.150.000!; perhaps it is necessary to coin a new keyword as “geophysics mathematics” that even now renders 15.600; 48. Operations research: 5.480.000 => far from its “golden age” this term looks strong enough and encompassing (it should be checked) some other derived (as per “Ortiz seed”): 48.1: Mathematical programming: 856.000; 48.2: Game theory: 3.880.000; 48.3: “Economy” + mathematics: 125.000; 48.4: “social and behavioral sciences” + mathematics: 202.000. As a suggestion we add “social and behavioral sciences” as a new node below; 49. Social and behavioral sciences: 541.000 => shared with Sociology, Cybernetics and Systems theory; “Social and behavioral sciences” + mathematics: 202.000; 50. Biology and other natural sciences: 18.300 => it looks like a new keyword perhaps created to differentiate a new branch of science like “molecular biology” from biology, physics and algorithmic in the recent past; perhaps this is a transition subject; it looks like a term coined by mathematicians because: “Biology and other natural sciences” + mathematics renders 12.500!; 51. “System theory”, control: 1.140.000 => for “systems theory” + control as per “Ortiz seed”; “system theory” alone: 1.960.000 and “system theory + control + mathematics: 344.000;
  • 207.
    52. “Information andcommunication”, circuits: 552.000 => this figure stands for “Information and Communication” + circuits; shared with Systems/Communications; “Information and Communication” + circuits + mathematics: 97.300; 53. “Mathematics education”: 1.770.000 Original Ortiz seed 1. Mathematics, 16.900 K 2. General, 15.400 3. History and biography, 221 4. K-theory, 291 5. Group theory and generalizations, 213 6. Topological groups, Lie groups, 201 7. Real functions, 1.500 8. Measure and integration, 270 9. Functions of a complex variable, 2.110 10. Potential theory 421 11. Several complex variables and analytic spaces, 208 12. Special functions, 1.810 13. Ordinary differential equations, 1070 14. Partial differential equations, 149 15. Dynamical systems and ergodic theory, 0 16. Difference and functional equations, 2.240 17. Sequences, series, summability, 174 18. Approximations and expansions, 221 19. Fourier analysis, 1.940 20. Abstract harmonic analysis, 188 21. Integral transforms, operational calculus, 195 22. Integral equations, 2.200 23. Functional analysis. 1.520 24. Operator theory, 541 25. Calculus of variations and optimal control; optimization, 240 26. Geometry, 1.550 27. Convex and discrete geometry, 657 28. Differential geometry, 2.250 29. General topology, 700 30. Algebraic topology, 2.320 31. Manifolds and cell complexes, 85 32. Global analysis, analysis on manifolds, 94 33. Probability theory and stochastic processes, ¿? 34. Statistics, 6.370 35. Numerical analysis, 4.340 36. Computer science, 13.000 37. Mechanics of particles and systems, 491 38. Mechanics of solids, 2.110 39. Mechanics of deformable solids, 29 40. Fluid mechanics, 1.920 41. Optics, electromagnetic theory, 128 42. Classical thermodynamics, heat transfer, 268 43. Quantum theory, 2.210
  • 208.
    44. Statistical mechanics,structure of matter, 1.1570 45. Relativity and gravitational theory, 1.710 46. Astronomy and astrophysics, 216 47. Geophysics, 1.760 48. Operations research, mathematical programming, 1.700 49. Game theory, economics, social and behavioral sciences, 153 50. Biology and other natural sciences, 1.950 51. Systems theory; control, 4.700 52. Information and communication, circuits, 1.1860 53. Mathematics education 11.800 100-200: 17 200-500: 4 500-1000: 5 1000-5000: 13 Around 5000: Statistics and Numerical analysis More than 10.000: Computers science and Mathematics education Total 17.000 “………..Fíjate que hay dos grupos grandes (100-200) y (1000-5000), tres muy grandes (Statistics, 6.400); (Numerical Analysis 4.400) y dos enormes: Computer Science (13000) y Education (12.000) que no hacen juego con los 17000 que da para toda la matemática. Las partes más abstractas e importantes de la Mat. no necesariamente tienen muchas citas….” => Back
  • 209.
    •This is abrief presentation of DARWIN, Distributed Agents to Retrieve the Web Intelligence, a cutting edge Technology to see the Web as "semantic", perfectly ordered. In the Web of today live more than 30,000 millions of Web pages, "creatures" of knowledge, a crucial asset of the humanity growing exponentially in size and complexity. The Web of Today Unstructured Only indexed by words •From here the need to "map" it precisely and efficiently like a Card Catalogue in a library. Based on proprietary agents and Artificial Intelligence algorithms Darwin creates "semantic glasses" to see the web like a Universal Encyclopedia perfectly stored and indexed. As such any piece of knowledge - information and intelligence - becomes retrievable, without obfuscation, generally in only one search, displaying meaningful results - not spam, fake, unreadable jargon, nor relevant. Semantic Glasses To see more and better In only one click •Once the Web is mapped website owners and users, governors and governed, teachers and learners, sellers and buyers may infer their respective behaviors and build Intelligent Reports about any subject. Darwin makes effective the Web paradigm: One thing leads to another thing; everything is connected. Human Knowledge Maps People’s behavior Intelligence Reports
  • 210.
    Prologue Darwin a cuttingedge technology  Slides 3 to 8: A tool “to see more and better the Web”;  Slides 9 to 15: A little of History;  Slides 16 to 19: How we humans document - WWD’s;  Slides 20 to 25: Entering into some detail - Level 1;  Slides 26 to 31: Knowledge as Trees from ancient times;  Slides 32 to 40: Entering in some detail - Level 2;  Slides 41 to 70: Darwin Vignettes:  Slide 41: Darwin Vignettes index;  Slides 42 to 47: Entering in some detail - Level 3;  Slides 48 to 57Darwin Antecedents;  Slides 58 to 69 Darwin Conjectures 0 to 9;  Slide 70 Epilogue, Darwin a cutting edge technology;  Slides 71 to 75 Epilogue details I to V; By Juan Chamero, Darwin Architect, August 12, 2013 Warning: If you are an Internet Expert you may go directly to the Epilogue to see the scope and reach of this technology.
  • 211.
    Darwin Human Knowledge Map Preliminaryresearch performed by Intag, Darwin Methodology creator, determines that it is perfectly possible to cover Internet users’ needs in terms of Knowledge in only one click. We are talking of significant documents, Authorities and hubs, subordinated to subject specificity, with minimum redundancy, and fully covering all Branches of Knowledge. Web Semantic, Pages 293-294, a Juan Chamero e-book Series: Mind To Digital
  • 212.
    Darwin Human Knowledge Map ThisHKM will have a Basic Virtual Library of nearly 12,000,000 i-URL’s pointing to an equal number of their corresponding Website documents (Authorities), referenced with a Web Thesaurus of nearly 15,000,000 concepts per language. This map is like a virtual Encyclopedia of nearly 40 million pages, cross referenced by a Conceptual Universe of +15 million concepts, more than 20 million of hyperlinks and semantically structured via a Logical Tree of nearly 500,000 subjects - nodes. Web Semantic, Pages 293-294, a Juan Chamero e-book Series: Mind To Digital
  • 213.
    What’s in aWeb Map We may imagine the Web as a huge Semantic Ocean where two types of knowledge creatures are continuously interacting: the black ones (K Realm) classified in up to 200 knowledge “species” versus the green ones (K’ Realm), “we”, the people as users. The only creatures that “live” in the Ocean are the Web Pages meanwhile we, humans, are “out”, however interacting, as long as we are Internet connected and active.
  • 214.
    Darwin for Dummies We,humans, have a rather big capacity to hold “in mind” ideas in our brain however hidden. This diversity is estimated in 10 to 15 millions per language/culture. Our ideas are recognized by others as long as they are “meaningful explained”. However it is highly probable that our explanations, for the same idea, differ substantially. Along time, statistically, we agree to assign unique “modal names” to our ideas. In the Web it’s supposed we are talking about the same idea when we point to the same “name”. In our culture, at least from Plato, those agreed names are recognized as “concepts”. In the figure the exultant Archimedes saying “Eureka”, discovering its Law and naming it.
  • 215.
    Something “To Seemore and better” Galileo Galilei (1564 – 1642) pioneer of the scientific revolution and for many the father of modern science. At its time Galileo was formally accused of heresy because its “sayings” about heliocentrism. He invented the telescope a tool to “see more and better” everything that surround us including the polemic and inscrutable sky (heavens) of those times. In fact its telescope magnified from 8 to 9 times our sight power.
  • 216.
    “What then tosee more and better” • Web Creatures • Web Pages • Websites • At last documents
  • 217.
    A little ofHistory You may skip his section. However we think that “a little of history” is always healthy as long as it contributes to open our mind – and our spirit- to accept some changes in our vision. Darwin deals with an Ontology to help us “to see more and better” the Web and for that we have to understand how we humans document our ideas. We do that at any moment of our life, almost automatically like “downloading” our “in mind” images thru the art of writing. Darwin states that the core of our downloads are “concepts”, in fact cognitive objects akin to Chinese and Japanese ideograms. Darwin also talks about akin entities like “symbols”, “words”, “expressions”, “keywords”, “quotations”, “acronyms”, “memes”, “cepts”, and “concepts”, however all them with substantive differences. This section intents to build a sort of emergency bridge to fully understand our Darwin Ontology.
  • 218.
    A little ofHistory - I As far as 15,000 years ago cavemen left messages alive under the form of petroglyphs and petrographs depicting what was important for them. They were things to communicate and perhaps early “writing experiences”. In Australia have been petroglyphs discovered dated as old as 40,000 years ago.
  • 219.
    A little ofHistory - II Tartarian Tablets, discovered by archaeologist Nicolae Vlassa, dated 5.900 BC years ago and for many the earliest form of “writing” of the world. It preceded the Proto Sumerian Pictography . Up to now nobody deciphered the meaning of carved symbols.
  • 220.
    A little ofHistory - III A long journey from hyeroglyphics to handwriting. From a picture or object representing a word. 650 BC appeared Demotic writing, which became a popular form, developed from the Hieratic and retained the features of the hieroglyphic system, including word and phonetic signs. Primitive writing was ideographic as opposed to phonetic. Some cultures evolved rapidly towards phonetic writing in which conventional signs or letters represent a sound. However ideographic writing still exists in the Chinese and Japanese ideograms of today.
  • 221.
    A little ofHistory - IV It’s a look on ancient History, Language and Architecture by Dr. Haluk Berkmen a PhD In Physics as a contribution to unveil relations between science and esotery, ancient wisdom and the forgotten past. Chinese and Sumerian scripts developed independently as the first writing systems . The first synthetic writing system was the early cave paintings. Along time appeared analytic writing systems where a word or part of a word (a syllable) is represented by a well defined sign.
  • 222.
    A little ofHistory - V Early in our childhood we learn to write cursive, namely handwriting, in the “hard way”. Cursive derives from Latin cursivus that means running. Most actual languages have their cursive mode even ideographic ones like Chinese and Japanese where the cursive-ness may apply to connectedness of strokes within the same character or ideogram.
  • 223.
    A little ofHistory – VI Grammar and those heavy things Writing Rules as handwriting are learn at the hard way. This book of Charles Gulotta try to alleviate that burden with humorous and friendly cartoons. For junior – high and high school students. Our learning should tend to write properly and meaningful., ideally as applying a “formula” to express our “in mind” ideas. Have you ever played the WFF’N Proof game to enhance children abilities in logic and problem solving?. Take into account that WFF which stands for Well Formed Formula could be a way to learn how to write WWD’s, Well Written Documents. You may also try to learn a little of grammar.
  • 224.
    Towards the ideaof WWD’s – I Good Writing – Its logic and subtleness For the exact term “writing rules” as of today Google bring us 337,000 references. Most of them opinions but almost nothing about theories or scientific approaches. 10 Writing Rules for Writing as of NYT, New York Times: we recommend rules 1, 5 and 6: Listen to the Voice Inside your Head, Study Sentences, and Write With Non-Zombie Nouns and Verbs. In Darwin Ontology this triad is equivalent to “in mind” ideas, write using specific and meaningful concepts resembling famous “sayings” and avoid pomposity and abstraction.
  • 225.
    Towards the ideaof WWD’s – II What’s behind large thematic samples Semantic Cores and Semantic Fingerprints Let’s suppose that there exist a large enough sample of “essays” dealing with the same idea (diabetes) written for different people in different ways along a span of time short enough to be the idea consistent. Let’s also suppose that for each “writer” the idea belongs to a semantic structure, namely a “tree map” that as such has “upper level” ideas, “lower level” or “derived” ideas and “collateral ideas”. To make the things more real and consequently more complex we have to take into account that writers have different writing talents, knowledge and cultural levels. As Darwin Ontology conjectures that writers tend to write “specifically” and “properly” from a semantic point of view our first step of analysis should be to unveil these properties.
  • 226.
    Towards the ideaof WWD’s – III What’s behind large thematic samples Semantic Cores and Semantic Fingerprints Remember that any document could be split in two kind of semantic particles: “Common Words and Expressions” and “Concepts”. In a first analysis we may extract a “Concepts Core”, let’s say from 30 to 60 concepts that statistically could be considered representative of the “in mind” idea our large enough sample intent to define. These concepts constitute the metadata skeleton of what Darwin define as “semantic fingerprint” of the idea. In the average good writers” write probabilistically following a protocol (something like a WWD Well Written Document formulae) guided by a master principle: specificity. To be specific is to write using the concepts core and only admitting (ex - post analysis and checking) a small percentage of semantic noise and/or contamination. At large minimizing UP, DOWN and COLLATERAL concepts.
  • 227.
    Towards the ideaof WWD’s – IV Is this message a WWD? Yes, it is. Some expressions are not only concepts but “sayings” and “memes”, like “people are hungry for change” that deserves 1.300.000 references in Google.
  • 228.
    Let’s take alittle pause Before going a little deeper This has been an introduction to a way of seeing the Web, more and better. Probably many of you not only agree about we have stated up to here but perhaps feeling a little confused because a strong and rooted belief: what you “see” now is the best available. We, on the contrary state that we all may see more and better perhaps a new Web dimension. We dared to suggest you that in every Web document there exist its somehow and sometimes hidden purpose, knowledge and intelligence no matter if present explicitly or implicitly focused or dispersed along the document layout. The new dimension will allow us to retrieve from the Web not only information and Knowledge but Intelligence Reports for any subject. Now we are going to try to present you this new vision that will need of your full attention. Thanks!.
  • 229.
    Our First Thinking •We humans document, “at large”, statistically, following a rather logical mathematical “formula” we may name as WWD, Well Written Document. • As we will see soon with only two types of semantic particles: “Common Words and Expressions” and “concepts”. The first ones make the literary filling meanwhile the second ones the meaning.
  • 230.
    Semantic Yin Yang CommonWords versus Concepts As we have seen Darwin Ontology imagined the Web as a dual world: creatures that live within the Web, K side, interacting with we, humans, within K’ side outside the Web core. This duality goes deep within the creatures (Web pages) architecture; its logical structure resembling a Yin – Yang Paradigm where the Yin part is represented by its “literary filling” and Yang part as its conceptual counterpart. The figure depicts the Yin part of a New Zealand Technical page about diabetes as unveiled by a Darwin agent.
  • 231.
    Semantic Yin Yang CommonWords versus Concepts The figure depicts the Yang counterpart of the Web page as unveiled by a Darwin agent: all concepts are “seen” like embedded within an implicit hierarquically track as for example [medicine…..[metabolic pathologies[diabetes]]….]. In Darwin Methodology these Yang counterparts that belong to the “Web pages metadata, constitute also the core of their “Semantic Fingerprints”, in fact the essence of their meaning.
  • 232.
    Detecting Concepts In theEinstein bibliography, from Wikipedia, we, as humans, marked in green what are concepts as per our criterion. For each type of document, language and theme humans criteria are transferred to Darwin agents. From time to time the logic and precision of transferring should be continuously checked.
  • 233.
    How we Document WWD’s WWD’sdeal with specific subjects like for instance “El e- Commerce” in Spanish within an agreed hierarchy: the root idea => subject => topic => sub topic => …=> concepts, followed by the text corpus. The whole as a unit corresponds to a given “in mind” idea. Metadata within the text corpus has, apart form the necessary concepts, common expressions like “sale for moving” and eventually important slogans known today as “memes” like “comprar y vender de todo” (buy and sell everything).
  • 234.
    Trees You may alsoskip this section if you are well acquainted with “graphs” and “logic”. However do not discard devoting a little of your time to see this series. We humans continue being in part esoteric and in part rationale. We all are attracted by the beauty and harmony of trees and tree forms and we all love to catalog things along logical trees. However we do not apply our coined along centuries tree wisdom neither to order the Web not even to order our Knowledge Curricula.
  • 235.
    Trees The Tree ofLife The Tree of Life (Biblical): As per the ancient Kabbalah, each circle represents one of the ten Sefirot, or emanations, by which the Divine manifests. From ancient times we humans tended to represent life, nature and the entity and primal substance we have agreed to address as knowledge, as “trees”: those creatures that from their roots tightly “enrooted” within the earth point to the sky diverging in branches, nodes, leaves and from time to time flowers - fruits. Life seems to be a superior instance of the Divine above knowledge that looks logical: at large eternity superior to wisdom. However this axiology looks like anthropic, apt for beings living over the surface of a planet.
  • 236.
    Trees The Tree ofLife Phylogenetic Tree of Life: also known as Evolutionary Tree. We humans are eukaryotes. Eukaryotes that are defined by having cellular membrane bound structures. We fall within the Animals kingdom within the Eukaryote Domain. See The Evolution of Eukaryotes and our Human Story. Most trees are depicted as going upwards from “roots” to “leaves”. On the contrary Computational Logical Trees are represented as going downwards with their roots up.
  • 237.
    Trees A Taxonomic Tree Taxonomy,the science of classification. We, humans are obsessed with categorizing things. Perhaps we need it in order to make sense of our world. The figure depicts a typical Logical Tree or Semantic Skeleton we use as Computational Trees, however inverted. We may appreciate root, branches, nodes, and the uniqueness of paths that go from any node to root and vice versa. However in imperfect and “in evolution” trees this uniqueness fails and closed circuits appear.
  • 238.
    Trees The Beauty ofTrees Trees of Life: The Kabbalah Tree of Life, The Yggdrasil of the Scandinavian Mythology, The Biblical Tree of Life, our Christmas Tree, all types of Taxonomy Trees , and the typical Oak Tree are examples of beauty and equilibrium and always a form of symmetry.
  • 239.
    Trees A Minimal AbstractFractal Tree The figure depicts a fractal built with a very simple recursive algorithm: “one opens in two”. Fractals teach us how nature may build “apparently” complex systems based on simple laws. The reverse is true: behind apparently complex systems probably are hidden simple form generation laws. This is key for our Darwin Methodology: behind the apparent chaos of the Web there is its rather simple skeleton of its semantic structure notwithstanding hidden.
  • 240.
    Back to thebeginning Some examples of Knowledge Mapping Now is time to go back to our Darwin Ontology schema in order to “see” it better. We, humans, the people, are interacting the Web from outside (green region) with the Human Knowledge hosted, apparently in chaos, in the Web Ocean (black region). However the intelligent semantic skeleton of the Human Knowledge is “up there” waiting to be unveiled!.
  • 241.
    Example I The Diabetessub Tree The Diabetes sub Tree skeleton is depicted as pending from Metabolic Pathologies a medical specialty derived from Medicine. It’s an inverted tree coded downwards and from left to right. The Diabetes sub Tree coded as 122 and at its turn it opens in 1221 and 1222 and finally 1222 opens in 12221, 12222, and 12223. You may also notice the Diabetes neighborhood, UP, DOWN and COLLATERAL.
  • 242.
    Example II A WorldArt Tree Skeleton In year 2007 a World Art Map was developed as a Human Knowledge Map demo to be the core of the Semantic Search Engine of the European Theseus Project now abandoned. The whole Art information “as it is” in the Web of that time (nearly 80 million Web pages) was reviewed and semantically synthesized by a family of Darwin Agents, This synthesis took the form of an Art Thesaurus with a Semantic Skeleton of 7.570 themes and nearly 300,000 concepts along 13 levels. In the figure the Rigoletto node as a concept connected to the root via the 11 levels track [0.1.2.2.2.2.14.1.6.10.5] is depicted.
  • 243.
    Example III Detail ofThe World Art First Level In the figure is shown the upper level of the World Art Map under the form of an Art Tree Index. Its five main sub Trees may be explored going “vertical” along 12 more levels. Being this map the core of a semantic search engine thru it users may obtain trustable references in only one click and a selected set of Authorities specifically dealing with the “node” subject, for example “street dance”. All these maps have e-learning capabilities either autonomously or controlled by humans. From time to time their arboreal structure is checked and Darwin agents and algorithms continuously suggest nodes deletions, new nodes aggregation, concept obsolescence, new concepts, and even structural changes.
  • 244.
    Example IV How aDarwin Search works
  • 245.
    HKM (K-side) An idealoverview detail of its Upper Level See next
  • 246.
    HKM (K-side) An idealoverview detail of its Upper Level This is a fancy cover of a HKM, Human Knowledge Map, showing for example its upper level with 12 “vortexes” where users may dig down searching WHAT THEY NEED in ONLY ONE CLICK assisted by an intelligent wizard. This map depicts the semantic structure of the K region, namely the “Established Knowledge” of our civilization at a given time. You may imagine it as the cover of a New World Library where as of today +30,000 million Web pages will be properly indexed thematically no matter where they are hosted in the huge Web Ocean.
  • 247.
    HKM (K’ -side) An ideal overview detail of its Upper Level See next
  • 248.
    HKM (K’ -side) An ideal overview detail of its Upper Level This is a fancy cover of a HKM, Human Knowledge Map (People), of the K’ side of Darwin Ontology, the People side, what people know as a collective and as individuals. It will substantially differ from the Established Knowledge of K side, more ambiguous, diffuse, imaginative, ample and why not a little chaotic!. We depict here in a fancy way its upper level arbitrarily split in 8 vortexes. This information is crucial to make Web Intelligence and to Detect and Infer Trustable People Behavior Trends!.
  • 249.
    42 - SearchEngines - How to see the Web as Semantically Structured – I 43- Search Engines - How to see the Web as Semantically Structured – II 44- HKM - Its Physical Structure I 45- HKM - Its Physical Structure II 46- Concepts and “in mind” ideas I 47- Concepts and “in mind” ideas II 48- HKM – Antecedents I 49- HKM – Antecedents II 50- HKM – Antecedents III 51- HKM – Antecedents IV 52- HKM – Antecedents V 53- HKM – Antecedents VI 54- HKM – Antecedents VII 55- HKM – Antecedents VIII 56- HKM – Antecedents IX 57- HKM – Antecedents X 58- Darwin Ontology - Introduction 59- Darwin Ontology - Conjecture 0 60- Darwin Ontology - Conjecture 1 61- Darwin Ontology - Conjecture 2 62- Darwin Ontology - Conjecture 3 63- Darwin Ontology - Conjecture 4 64- Darwin Ontology - Conjecture 5 – Part I 65- Darwin Ontology - Conjecture 5 – Part II 66- Darwin Ontology - Conjecture 6 67- Darwin Ontology - Conjecture 7 68- Darwin Ontology - Conjecture 8 69- Darwin Ontology - Conjecture 9 70- 75 Epilogue Darwin Vignettes
  • 250.
    Search Engines How tosee the Web as Semantically Structured - I Google like most conventional search engines are semantically unstructured, “flat”. It means that they do not index Web pages by “theme” but by words. It also means that from the point of view of semantics the actual universe of +30,000 millions pages are indexed sharing the same “zero-ground” level. Trying to see the Web like semantically ordered we may imagine it like a Super Library Building of as many floors as existing “semantic levels” (from 10 to 15). In the figure we have imagined the Web structured as a Darwin Hypercube. In the beginning, ex-antes of applying our Darwin Methodology, all Web pages are at zero ground. As long as Darwin agents and algorithms proceed each Web page goes to its level and correct virtual place (room, raw, rack, shelf,…) as hypothetically building up the Hypercube.
  • 251.
    Search Engines How tosee the Web as Semantically Structured - II Building the Darwin Hypercube: its first step is to unveil the Human Knowledge Map skeleton; Its second step would be the Semantic Profile buildup for each branch, discipline by discipline. As a probable third step, perhaps initially, the metadata buildup for the whole Web. For each “subject” Darwin process “textons”, huge aggregations of nearly 100,000 Web pages at a time where from special algorithms unveils its “Subjects Fingerprint”, a special metadata depicted at right , basically a string of concepts (k’s) weighted with their mean density and a header. As a fourth step each subject fingerprint is matched within the subject neighborhood testing the Darwin Ontology Conjectures.
  • 252.
    HKM Its Physical StructureI How a HKM looks like “physically”? : As it is content semantically structured along logical trees or at least along somehow arboreal graphs we may imagine a two way reversal mapping between this content and parts of a one-dimensional string , perhaps something similar to genes in the Genomes. Talking of a HKM of 500,000 “Subjects” belonging to 200 Knowledge Branches or “Disciplines” of the Human Knowledge it would resemble a small forest of 200 “trees” having from 1,000 to 10,000 subjects each one, 2,500 in the average. Each of these trees could be mapped along one dimensional memory, for instance navigating from roots to leaves from up down and from left to right .
  • 253.
    HKM Its Physical StructureII In each of these 500,000 subjects – nodes we should host from 20 to 50 “concepts”, in the average 30, totaling a “Controlled Vocabulary” of 15 millions concepts per language. In the figure at right we show a schema of a HKM skeleton where subject S56 opening in 9 sub subjects is highlighted. We may also see the neighborhood spectrum that depicts how concepts focus within specific subjects and how “diffuse” over their neighborhood.
  • 254.
    Concepts and “inmind” ideas I “In mind” ideas: we all have billions of “in mind” ideas however hidden and protected for ever unless we decide to share them with “others” via gestures and written and oral communication. The Semantic Web deals with ideas and “concepts”, ideas that that from Plato could be shared and agreed about their meanings via “explanations” and illustrated by examples. Someone at right got an idea: how Conceptual Maps may help us improve the learning. Let’s suppose that at a given moment many people have the same idea. How could them realize they are “talking” about the same idea?. Of course, even for a given language and for native language people is not easy to agree about that, isn’t?
  • 255.
    Concepts and “inmind” ideas II Along our evolution we learn to identify ideas throughout “names” and for believers God is the Almighty, the Pantocrator, the Verb, that throughout naming creates. The Web is the appropriate scenery to be conscious of this power manifestation: at any moment we may be statistically sure that any idea is present with a name that compete with others similar but with less presence. In our Darwin Methodology we talk of “modal names”. For example the Pope Francis launched recently the six words expression: “a poor church for the poor”, pointing to a similar idea that probably more than a billion of people have in their minds. This name is modal, you may check that no other compete with it to represent the idea neither qualitatively nor quantitatively .
  • 256.
    HKM – AntecedentsI The Figurative System of Human Knowledge , 1751 – 1772 also known as The Tree of Diderot and d'Alembert is perhaps the first antecedent of a Human Knowledge Map. We may also see "Epistemological angst: From encyclopedism to advertising“ a Google book from Robert Darnton, University of California, Berkeley (2001). It’s a master work of the Renaissance where knowledge opens in three main branches: Memory - History, Reason – Philosophy and Imagination - Poetry. This tree of three main branches opens at their turn in 145 disciplines.
  • 257.
    HKM – AntecedentsII At right we may appreciate a schema that depicts the project Consensual Map of Science that synthesized 20 existing maps of science in three basic forms, hierarchical, centric and non-centric. The chosen order of sciences (arbitrary) was: mathematics physics, physical chemistry, engineering, chemistry, earth sciences, biology, biochemistry, infectious diseases, medicine, health services, brain research, psychology, humanities, social sciences, and computer science.
  • 258.
    HKM – AntecedentsIII Propaedia, the Britannica Human Knowledge overview is a semantic “index tree” of 130 Major Disciplines. It opens in 10 main branches, 41 sections and 167 divisions. It’s the adieu of the Britannica at its 2010 last print edition as they continue online. It’s an heterodox outline of the Human Knowledge, perhaps a master piece of Authoritativeness at the classic style, defining the knowledge some Authorities see instead of a consensual work trying to agree about and define it as is. See syntopicon. Britannica Global Edition 2010 contained 30 volumes,18,251 pages, with 8,500 photographs, maps, flags, and illustrations in smaller "compact" volumes. It contained over 40,000 articles written by scholars from across the world, including Nobel Prize winners.
  • 259.
    HKM – AntecedentsIV Great Books of the Western World it is a series of books originally published (1952) by Britannica, a package of 54 volumes. This work is based and influenced by the Syntopicon (1952), Center for the Study of the Great Ideas, an index of the Great 102 Ideas of the Western World compiled by the American Philosopher Mortimer Adler. From A to Z are listed below. Angel, Animal, Aristocracy, Art, Astronomy, Beauty, Being, Cause, Chance, Change, Citizen, Constitution, Courage, Custom and Convention, Definition, Democracy, Desire, Dialectic, Duty, Education, Element, Emotion, Eternity, Evolution, Experience, Family, Fate, Form, God, Good and Evil, Government, Habit, Happiness, History, Honor, Hypothesis, Idea, Immortality, Induction, Infinity, Judgment, Justice, Knowledge, Labor, Language, Law, Liberty, Life and Death, Logic, and Love, Man, Mathematics, Matter, Mechanics, Medicine, Memory and Imagination, Metaphysics, Mind, Monarchy, Nature, Necessity and Contingency, Oligarchy, One and Many, Opinion, Opposition, Philosophy, Physics, Pleasure and Pain, Poetry, Principle, Progress, Prophecy, Prudence, Punishment, Quality, Quantity, Reasoning, Relation, Religion, Revolution, Rhetoric, Same and Other, Science, Sense, Sign and Symbol, Sin, Slavery, Soul, Space, State, Temperance, Theology, Time, Truth, Tyranny and Despotism, Universal and Particular, Virtue and Vice, War and Peace, Wealth, Will, Wisdom, and World.
  • 260.
    HKM – AntecedentsV The Library of Congress Online Catalog , briefly LOC, contains approximately 14 million records representing books, serials, computer files, manuscripts, cartographic materials, music, sound recordings, and visual materials. The Catalog also displays searching aids for users, such as cross-references and scope notes. The catalog records reside in a single integrated database; they are not separated according to type of material, language of material, date of cataloging, or processing/circulation status. This reservoir is related to others like The American Memory of the LOC. Its Web growth concerning traffic, use, sponsorships and initiatives looks like a little frozen since 2005. It also have an agreement with UNESCO to build a World Digital Library.
  • 261.
    HKM – AntecedentsVI The WWW Virtual Library (VL) is perhaps the oldest Web Catalogue, founded by Tim Berners-Lee, the creator of HTML and of the Web itself (1991) . It is run by a loose confederation of volunteers, who compile pages of key links for particular areas in which they have expertise. It looks like a litle frozen since 2008.
  • 262.
    HKM – AntecedentsVII DMOZ, initially Directory from Mozilla, belongs to the ODP Open Directory Project initiative, a human edited directory built and maintained by a vast and global community of volunteers, nearly 100,000. Its search engine is thematic registering more than 5 million Websites in 1 million categories . One of its drawbacks is that volunteers rule the directory evolution without adjusting themselves to an agreed ontology, that is “subjects”, “subject names”, and even “authorities” come from rather arbitrary suggestions from people.
  • 263.
    HKM – AntecedentsVIII Wolfram Alpha, is a Knowledge Base built with curated and structured data extracted from the Web about some branches of the Human Knowledge. This database could be queried by a search engine that directly try to answer queries instead of the conventional search engines procedure of providing a list of suggested links. For many it provides a new way of searching: an answer engine. For example asking for “Euclidean algorithm” Wolfram Alpha tell us all math, logic and computational information we “presumably” need. On the contrary Google bring us 3,340,000 references telling us implicitly something like : we did our work, now everything is up to you!. It focus on science and systematizations of knowledge.
  • 264.
    HKM – AntecedentsIX GeoNames: it contains over 10,000,000 geographical names corresponding to over 7,500,000 unique features. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. This base is free of charge and Its Web Services offer direct and reverse geocoding. The figure at right depicts Geonames Ambassadors. You may engineer this service with some others like GoogleMap.
  • 265.
    HKM – AntecedentsX Disperse Big Data: via Social Networks “big data” reservoirs are now accessible in a way extremely atomized and dispersed, belonging in our Jargon to the K’ side (people’s side). Notwithstanding with this raw information, from a statistic point of view, precise analysis and outcomes could be obtained. However this outcome is important but not crucial to build the K’ Thesaurus and unnecessary to improve the K Thesaurus.
  • 266.
    Darwin Ontology Introduction Darwin Ontologyis the core of Darwin Methodology, a set of AI, Artificial Intelligence procedures to “see more and better the Web”. The Web as of today is “semantically flat”: millions of content pieces of different color -tonalities (themes) are uniformly dispersed here and there as conventional search engines like Google only index by words, not thematically. Up: Take a look to a search track up trying to guess something valuable about “green”. Your queries are rather multicolor but you learn fast focusing on green as times goes on. Down: With the whole Web content semantically structured by “color”, logic, math, physics, law, nature, philosophy religion , entertainment…… searches go direct, in only one click – guess.
  • 267.
    Darwin Ontology Conjecture 0 Conjecture0: the triad Logical Tree, Thesaurus, and Cognitive Objects unequivocally identifies any type of Knowledge . We humans at large, statistically, document “tree wise”, generally reverse mode from root to leaves, specifically from general and global to particular, “talking”, “teaching”, “convincing”, “registering”, level by level. This content hierarchy is statistically unveiled by Darwin agents and algorithms. Knowledge trees have a “skeleton” of nodes (Logical Tree), a “Controlled Vocabulary” (Thesaurus) as a list of node names and the Semantic Fingerprints within each node (Cognitive Objects).
  • 268.
    Darwin Ontology Conjecture 1 Conjecture1: Website Administrators and Owners “speak” and “think” rationally in terms of their objectives and in terms of their matchmaking policies.’’ We say rationally because they behave like thinking along a tree and speak like automata answering users’ queries at governors versus governed mode. As a matter of fact things in Darwin K Realm (black region) are “fossils”: laws, regulations, codes, prescriptions, are the consensual truth at a given moment inherited from the past. K mission is to cancel, moderate or at a negotiation extreme make changes to adapt better the established order to claims coming from K’ (green region). In K things are frozen at a given time, for instance “as of …”. That’s not bad!. It’s the eternal game of evolution. In K’ life is continuously brewing, the new ideas, suggestions, opinions, all came from K’. Warning: In order to flow freely information and intelligence both sides should be semantically structured (mapped). With K structured and K’ unstructured it is impossible to infer People’s Behavior Patterns.
  • 269.
    Darwin Ontology Conjecture 2 Conjecture2: Users “speak” and “think” rather chaotically in terms of their passions, desires, their necessities at large. Users need “solutions”, prescriptions that supposedly are stored and publicly offered in K Realm, in most cases openly and free. To get them the only tool they have at hand are symbols, buttons, links, and at large strings of meaningful words of K side. So they are obliged to know precisely the K jargon to succeed. A smart observer, looking from a virtual “e- membrane” at the owners’ side, may arrive to the conclusion that any offer has a well defined purpose and that its “hunting” strategy could be precisely inferred. Warning: On the contrary without K – K’ mapping it would be practically impossible for any observer located in the same place to infer any type of order at the users’ side.
  • 270.
    Darwin Ontology Conjecture 3 Conjecture3: Users’ interactions along sessions are strings of semantic particles of two types, users’ keywords, and navigation instances. The sessions’ strings are the representation of the users’ strategies to satisfy their needs. Users sessions’ tracks of the form [iikikkikiikkiii…..], where k stands for keyword and i for instance, even ignoring their related outcomes, provide us a primal source to evaluate our offer and to infer users’ behaviors without interfering with them! These tracks could be considered primitive expressions of the users’ searching Jargon.
  • 271.
    Darwin Ontology Conjecture 4 Conjecture4: Cognitive Objects, documents, are expressed as strings of two semantic particles or molecules, Common Words and Expressions belonging to a given Jargon, and concepts belonging to a “Controlled Vocabulary”. Semantic Yin – Yang: See our PPT series 14 thru 18 slides. Like in the Yin- Yang monad we humans document via two types of particles: one to communicate the content essence of our “in mind” ideas (Concepts) and other for literary filling to make our documents more comprehensive and friendly (Common Words and Expressions”). One thing astonished me in the past when dealing with newspaper marketing people was that they talk of the “cognitive” content as “blank”, only attractors of a physical media to carry ads and businesses. On the contrary for regular users like you and me, the newspaper is a media that reduces our ignorance, our uncertainty, and ads just filling.
  • 272.
    Darwin Ontology Conjecture 5– Part I Conjecture 5: It is possible to enable a Full Duplex Type communication between Websites and their users throughout an e- membrane, enabling the free flow of content and its associated intelligence between them. In Darwin architecture interactions are performed through an e-membrane that resembles a living semi permeable membrane where all the messages going forth and back through it are processed trying to extract as much information as possible from them.
  • 273.
    Darwin Ontology Conjecture 5– Part II When the message is a potential concept coming from users’ side, the membrane performs all possible statistics at both sides, (green and black) : it accounts for K Thesaurus concept use at owner side, makes a request to offer all documents available, and accounts for the user history. Besides these statistics keyword are analyzed and matched versus a K’ Thesaurus. Each keyword within a session is potentially considered as part of a “speech” or part of a sort of interception “truth game”. The core of its “smart” built in strategy is to keep it from spying users. No cookies, no brute “bail and catch” hunting devices. Total absence of owners’ side messages such as: Come here again!, You are Welcome, and even the mild May I help you?.
  • 274.
    Darwin Ontology Conjecture 6 Conjecture6: Intrusions in communications cause serious troubles that go deeper and farther than a local perturbation. The slightest intrusion may make invalid not only the session but prevent users from communicating freely. They distort statistics and the users’ strategies as well. More than a conjecture it could be considered a fact, something that has gained consensus. However effects and counter effects of surveys and polls abuses should be investigated and measured. Conventional surveys and polls are based on intrusions, sometimes generating visible harassment. These intrusions condition the answers. How much?. It has to be measured in order to eradicate it as much as possible as a credible methodology to know the K’ side.
  • 275.
    Darwin Ontology Conjecture 7 Conjecture7: Human Knowledge is bounded. We talk about computational ontology at a given moment in a given language. Rather huge but “numerable” within our actual “state of the art” of computing, namely: from 10 to 15 millions of “in mind” ideas expressed as concepts, from 350,000 to 500,000 subjects or themes that deserve study under the form of essays, thesis and books, distributed hierarchically along a “forest” of Human Knowledge Disciplines, from 170 to 200. To this basic semantic asset Geographical Names, Art Collections, Acronyms and ephemeris and historical data should be aggregated. Finally, huge but bounded and actually efficiently computable.
  • 276.
    Darwin Ontology Conjecture 8 Conjecture8: Given a Logical Tree (skeleton) we may generate automatically its related Thesaurus To make this conjecture valid we need at least a cluster of 100,000 “Authorities” per Logical Tree, for instance Medicine. To accomplish this heavy task Darwin Methodology proceeds in a series of steps, namely: From the skeleton we extract a flat Thesaurus seed; with this seed we extract our first knowledge base; from this we extract a whole flat thesaurus, not a seed; with this flat thesaurus a better knowledge base; finally with this knowledge base guided by the Logical Tree we structure the whole thesaurus computing all possible tracks from root to the specific nodes.
  • 277.
    Darwin Ontology Conjecture 9 Conjecture9: Given a Historical Reservoir we may generate its related Thesaurus and a collection of its main Subjects and Themes. See ours slides of the Series How To See The Web as Semantically Structured. In fact to unveil a Thesaurus from the Web is a complex procedure as a series of steps that resembles industrial refinery processes trying to obtain in each step lighter and subtler products (as we go from root to leaves). The key are good seeds. Take into account that seeds are imagined by humans based on agents suggestions. Once a seed is pre selected Darwin starts its grow up to medium size. Wrong seeds rapidly guide to anomalous results meanwhile good ones evolve coherently maintaining the arboreal form. Good seeds are grown up to full size however its coherency under continuous monitoring. Finally full size solutions are submitted to a Human Board of Experts for approval.
  • 278.
    Epilogue Darwin a cuttingedge technology  The Web as_is of today is semantically unstructured. Darwin Ontology make it possible to map the whole Web unveiling the thematic essence of each of its + 30,000 millions of pages.  This map behave like “Semantic Glasses” that enable users to see the Web as “semantically structured”, without interfering with it.  The map enable users to FIND WHAT THEY ARE LOOKING FOR IN ONLY ONE CLICK even from Mobile units.  With the Web semantically structured it is possible the Inference of People Behavior Trends.  Darwin Ontology enable humans to detect and retrieve all pieces of information and intelligence disperse in the Web about ANY THEME and synthesize through it precise and meaningful Intelligence Reports. By Juan Chamero, Darwin Architect, August 12, 2013
  • 279.
    Epilogue I The Webof Today – The Web of Tomorrow Darwin Ontology is the core of Darwin Methodology, a set of AI, Artificial Intelligence procedures to “see more and better the Web”. The Web of today is “semantically flat”, unstructured: trillions of pieces of information (different color –tonalities by theme) are uniformly dispersed here and there because conventional search engines like Google only index by words, not thematically. The Web of Tomorrow is depicted down as “semantically structured” along “Trees” of different colors - tonalities, one for each Branch of the Human Knowledge (~200).
  • 280.
    Epilogue II Semantic Glasses UP:The Web of Today looks unstructured, disordered, as_is. Darwin agents as “Super Librarians” inspect, from a semantic point of view, all pages inferring their thematic and building their corresponding metadata onto a HKM, Human Knowledge Map . Along this process Darwin agents do not make any intrusion, everything except the map remain untouched. DOWN: the user may “see” thru “Semantic Glasses” The Web of Today as if it were The Web of Tomorrow. To do that the “Semantic Glasses” needs the map and the help of a Super Librarian working throughout an e-membrane.
  • 281.
    Epilogue III Find WhatYou Are Looking For In Only One Click UP: Take a look to a search track up trying to guess something valuable about “green”. Your queries are rather multicolor but you learn fast focusing on green as times goes on. However you obtain a very dispersed and atomized green. DOWN: With the whole Web content semantically structured by “color”, namely logic, math, physics, law, nature, philosophy religion , entertainment…… searches go direct, in only one click – guess. Everything you obtain is green and valuable!.
  • 282.
    Epilogue IV Inference ofPeople Behavior Trends Since ancient times many people live making and selling inferences about everything mainly about the future. By first time we humans have at hand a model of how we are and think: The Web. As an example the figure depicts a work about “10 Trends for 2013” selected at random by a Darwin Agent as of August 2013 querying Google by “social behavior trends”. Darwin Ontology states that in order to make consistent inferences the Web should be semantic. Darwin solve this conundrum building “semantic glasses” that enable us to see the Web as perfectly structured. However Darwin also states that to make consistent and credible inferences we also need to see the people’s side semantically structured. It means mapping the whole Web users’ demand. Darwin may build both maps.
  • 283.
    Epilogue V Intelligence Reports DarwinOntology enable humans to detect and retrieve all pieces of information and intelligence disperse in the Web about ANY THEME and synthesize them in precise and meaningful Intelligence Reports. This “anything” could be an “avatar” or sum of existing information and knowledge about a person, an organization, an entity, a country, a region, a situation, an area of activity, etc. The figure depicts a Darwin agent building a facet of an avatar, for instance its biography. The Web is browsed cluster by cluster inspecting “Authorities”. It “comes” from cluster 10, scouts exhaustively cluster 11 by saving words, expressions and potential concepts (k’s) and continue to cluster 12.
  • 284.
    Darwin Presentation tothe XXXXX Institute By Juan Chamero, from Barcelona, as of January 2015 Street Art Utopia, by David Walker -Juan Tuazon
  • 285.
    The Art Map •The Art Map, as a piece of the HKM, Human Knowledge Map, was built for the EU, European Union as a demo of the “semantic” talent of Darwin Methodology to retrieve out of the Web_as_is all the information and intelligence disperse here and there from the past to present. • The knowledge creatures you are going to see, semantically structured by Darwin, were “up there” in the Web Ocean uploaded openly and at will by millions of artists and laymen. • That map was uploaded to the presentation laptop acting as virtual “semantic glasses” of the Google browser.
  • 286.
    1. A threeupper levels vision Take a look to its seven main clusters. Within each cluster the main art subjects are depicted and hyperlinked like in a geo map. The Art Tree is deployed from root to leaves in up to 13 levels.
  • 287.
    2. Mouse overthe node “Drawing” The tree could be navigated either by track or by neighborhood (white links). Mouse over “Drawing” depicts its corresponding sub tree.
  • 288.
    3. A detailof the upper level “Drawing” sub tree This is a sub tree of 80 derived nodes. For each node we may have access to a gallery of images (5 in this demo). Optionally and tentatively we may inspect nodes and their content without clicking.
  • 289.
    4. Knowing alittle more via Google images thru Darwin Semantic Glasses We may inspect specifically related images via intelligently pointed suggested queries, for example “Leonardo vitruvian man”.
  • 290.
    5. Knowing alittle more via Google images thru Darwin Semantic Glasses Darwin makes another suggestion: Leonardo da Vinci Drawing Machines.
  • 291.
    6. We areinvited to search “pen and ink” works within “artist tools” Intelligently guided, at will or randomly we are invited to inspect popular nodes
  • 292.
    7. Knowing alittle more querying by the concept “masters of drawing” via Google Web Darwin may focus semantically deep: see the NOT TRIVIAL specific query track from root to leaves: “arts” “visual arts” drawing “masters of drawing”.
  • 293.
    8. Deepening astep querying by the concept “Leonardo da Vinci” via Google Web See how Darwin generates an optimal query to find the most authoritative documents , some links of the semantic track are suppressed, some as open search and some as closed search.
  • 294.
    9. Similar asabove querying by Michelangelo We are still “inspecting” the map before deciding to “click”. As Darwin maps evolve by themselves users may suggest other options off tree.
  • 295.
    10. Let’s gonow to the “Drawing” neighborhood As explained we may navigate either at sub tree mode or by neighborhood. We invite you to make click on neighborhood white link.
  • 296.
    11. Let’s seewhat a neighborhood is The figure depicts the Drawing node neighborhood: its ancestry: Painting: Classic; its seven “sons”; its many “peers”, or brothers, etc.
  • 297.
    11 bis. SemanticNeighborhood and Tree Logic By “semantic neighborhood” we mean the pertinence or membership to a “semantic family”, in this example to “the drawing family”. So drawing is the second son of “classic arts” and brother of “painting”. It has many “brothers” as sculpture, and architecture and several “sons” or subordinated subjects such as “history”, “artist tools”, “support media”,…. Trees and sub trees may also offer access to arbitrarily agreed forms of extended families including collateral subjects at the level of “uncles” and “aunts” and even closely related and/or friendly and/or akin subjects. Next we are going to explore how data is structured in a sort of database. One of the problems we face when dealing with “Big Data” applications (and this is one!) is how to offer friendly and efficient interfaces to navigate and at the same time providing overall visions and up to the minimum detail as in geo maps. As we will see a HKM, Human Knowledge Map in a given language, must map more than 15 millions “ideas” along more than 600,000 subjects (themes or topics) finally structured as a “knowledge forest” of about 200 disciplines. Take into account that The Art, only a small piece of it notwithstanding “complete”, has 7,571 subjects and about 500,000 “ideas”.
  • 298.
    12. Let’s seea little deep inside Darwin The Art Map content could be saved and deployed resembling a DNA vector along a two dimensional matrix, in this demo of 23 columns by 329 rows. Concerning the whole HKM, Human Knowledge Map it would be saved and deployed in 23 columns by approximately 30,000 rows, ~690,000 subjects, not too much in terms of “Big Data”! Within each “semantic cell” that have specific and unique name is hosted the “semantic fingerprint” of the “subject” pointed by the name, a brief description of it and a set of “authoritative sources” where from Darwin agents retrieved the description (i-URL’s).
  • 299.
    13. Passing themouse over “painting” The deep level of browsing: Imagine yourself browsing The Art tree by track from root to leaves and going from right to left and in parallel creating the 7,571 cells from upper left corner of the matrix, going right and down row by row unfolding the tree in a rectangular matrix. We may go now to browse the whole map cell by cell and even within each cell reaching a semantic universe of ~500,000concepts!
  • 300.
    14. Let’s inspectthe “interior” of a given cell: “lyric soprano” We are interested to know as much as possible about “lyric soprano”. Darwin tell us that The Art map has a node named “lyric soprano”. The demo is adjusted to search by nodes first and then by concepts.
  • 301.
    15. Let’s goto “lyric soprano” cell and its neighborhood
  • 302.
    16. Mouse overLyric Soprano again …. Doing mouse over the Lyric Soprano cell activates the same search features as in slide 2 and subsequent: sub tree of the node and its neighborhood complemented by a gallery of images.
  • 303.
    17. Node content:i-URL’s and Semantic Fingerprints We are on the deepest semantic mode, within the node! A whole “Semantic Fingerprint”, a brief semantic description of the node subject and more …….
  • 304.
    A new Dimensionof Searching • We are entering into a new dimension of searching: This “feature” is not only a powerful tool to make the search more direct and precise but a tool to find whatever we need in only one click as well. It is equivalent to being in a huge Web Library managed by expert and friendly librarians. • We have said that in each node is stored something like a sample of its subject. Let’s suppose that for the subject “masters of drawing” there exist 20,000 Web pages dealing with this subject with a high level of authoritativeness. Darwin Agents under Darwin Methodology and guided by Darwin Ontology may unveil from these raw clusters of content a weighted set of dominant concepts (modal concepts) that are considered the semantic synthesis of the cluster subject: masters of drawing in this case. This set of concepts is the core of the above mentioned “semantic fingerprint”. • You may easily guess that adding one of these specific and unique concepts to your “querying” it will focus precisely on the semantic key you are looking for! We are close to the “find a needle in a haystack” utopia. • You will be now invited to see how this feature works. Darwin agents will also generate for each subject its corresponding description expressed as part of its metadata (i-URL). Concepts could be of several types: generic, objective, functional, etc.
  • 305.
    18. List ofconcepts stored in “Lyric Soprano” node You are invited to make click, perhaps your first click along this demo: making mouse over will provide you only semantic overview. In order to be specific and going right to the point you must make a click: in the average no more than one!
  • 306.
    18 bis. Listof concepts stored in “Lyric Soprano” node scroll down …….. Each of these true set of concepts, objective (famous lyric sopranos) and generic (fach, full lyric, light lyric, range, …..) and some other categories add specificity and excellence to your search.
  • 307.
    19. Examples ofspecific search in only one click
  • 308.
    Epilogue The LOGO isbehind the Web! At last: It is all in a name! •Human ideas are known by their unique “agreed” names for each pair “Language – Culture”. •The Web enabled this unique agreed naming. •Darwin unveils these unique agreed naming.