SlideShare a Scribd company logo
1 of 23
Big Data =         Bigger Meta
O’Reilly Strata Conference
February 29 2012
Pivot/Skate, etc…
   Founded 2003
    Poor man’s GIS
    Panamap

   Refounded 2006
    Neighborhood boundaries
    Mass transit data


   Refocused 2009
    SaaS for mapping + on-demand data
Achtung!

     NoSQL is no panacea
           Big Data isn’t about data
           Big Data isn’t new
           Big Data doesn’t present a Boolean quandary
           With power comes responsibility
            AWS bills
            Lady Gaga tweets
            Innumeracy (correlation v causation)
Big v Important

  Big                         Important
        Heterogeneous            Well-defined schema
        Raw                      High value (not free)
        Distributed              Test-driven
        Streaming/real time      Relational
        Search for meaning       Historical
        Time-sensitive           Enterprise-focused
        Philosophical
Data Exhaust


     Analytics                  Probes




                 Social Media            Gov 2.0
Platforms




 Commoditization of compute and storage
A Brief History of Metadata




       Callimachus            Library of Alexandria, Egypt
A Brief History of Metadata

                              “Pinakes” (lists)
                                  Title
                                  Category
                                  Author
                                  Author birthplace
                                  Father
                                  Word count




       Callimachus
A Brief History of Metadata
A Brief History of Metadata
A Brief History of Metadata




Card catalog room,
Library of Congress c. 1920
A Brief History of Metadata

 Dewey Decimal System goes electronic in 1967
Out with the Old, in with the New




Archiving card catalogs
after digitization
Why Can’t We Be Together?


      Metadata              Data
Exponential Growth in Data


         Unprecedented rate of data creation, 1995-today
Data




       Pinakes                                     Catalog     Taxonomy Database




         300 BC                                      1595 AD         1876   1970
Oh, How I’ve Missed You


The reunification of metadata
and the artifact
Together At Last
GIS Data is Unevolved




               +        =
Enter the Data Curator


Part social scientist, part librarian,
part statistician, part RDBMS wiz
DIKW Model
    Data
        Fact, Signal, Symbol
    Information
        Structural v Functional
        Symbolic v Subjective
    Knowledge
        Processed
        Procedural
        Propositional
Popularity (Google Trends)
Words to Live By




                   dx /
                          dt
Thank you!
ian@urbanmapping.com
@urbanmapping




                        R.I.P.
                       Schema

More Related Content

Viewers also liked

Convergence and Interoperability (IFLA 2011)
Convergence and Interoperability (IFLA 2011)Convergence and Interoperability (IFLA 2011)
Convergence and Interoperability (IFLA 2011)
Figoblog
 
Work In Progress
Work In ProgressWork In Progress
Work In Progress
samluk
 
Project-imp Report 02
Project-imp Report 02Project-imp Report 02
Project-imp Report 02
samluk
 
მშობლიურის აქტივობა
მშობლიურის აქტივობამშობლიურის აქტივობა
მშობლიურის აქტივობა
cira75
 
Assistive Technology Webquest
Assistive Technology WebquestAssistive Technology Webquest
Assistive Technology Webquest
angtapper
 
დედაენა
დედაენადედაენა
დედაენა
cira75
 

Viewers also liked (20)

The Big Metadata
The Big MetadataThe Big Metadata
The Big Metadata
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Data Harmony Thesaurus Master®
Data Harmony Thesaurus Master®Data Harmony Thesaurus Master®
Data Harmony Thesaurus Master®
 
3 dw architectures
3 dw architectures3 dw architectures
3 dw architectures
 
10 razones para quiebran un emprendimiento (2)
10 razones para quiebran un emprendimiento (2)10 razones para quiebran un emprendimiento (2)
10 razones para quiebran un emprendimiento (2)
 
Big Data Madison: Architecting for Big Data (with notes)
Big Data Madison: Architecting for Big Data (with notes)Big Data Madison: Architecting for Big Data (with notes)
Big Data Madison: Architecting for Big Data (with notes)
 
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big DataSelf-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
 
Inline Tagging and Dictionary Connection
Inline Tagging and Dictionary ConnectionInline Tagging and Dictionary Connection
Inline Tagging and Dictionary Connection
 
Convergence and Interoperability (IFLA 2011)
Convergence and Interoperability (IFLA 2011)Convergence and Interoperability (IFLA 2011)
Convergence and Interoperability (IFLA 2011)
 
Work In Progress
Work In ProgressWork In Progress
Work In Progress
 
The Design of Data
The Design of DataThe Design of Data
The Design of Data
 
Project-imp Report 02
Project-imp Report 02Project-imp Report 02
Project-imp Report 02
 
მშობლიურის აქტივობა
მშობლიურის აქტივობამშობლიურის აქტივობა
მშობლიურის აქტივობა
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynote
 
Chapter 2 5
Chapter 2 5Chapter 2 5
Chapter 2 5
 
Assistive Technology Webquest
Assistive Technology WebquestAssistive Technology Webquest
Assistive Technology Webquest
 
დედაენა
დედაენადედაენა
დედაენა
 
An Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAn Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOA
 

Similar to Big Data = Bigger Metadata

Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
Steve Watt
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 

Similar to Big Data = Bigger Metadata (20)

STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
Normalization: A Workshop for Everybody Pt. 1
Normalization: A Workshop for Everybody Pt. 1Normalization: A Workshop for Everybody Pt. 1
Normalization: A Workshop for Everybody Pt. 1
 
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big data
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
 
What is a database (for non techies)
What is a database (for non techies)What is a database (for non techies)
What is a database (for non techies)
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Thinking of Linking
Thinking of LinkingThinking of Linking
Thinking of Linking
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Base de datos historia
Base de datos historiaBase de datos historia
Base de datos historia
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Steve Watt Presentation
Steve Watt PresentationSteve Watt Presentation
Steve Watt Presentation
 
Big Metadata: Mining Special Collections Catalogs for New Knowledge
Big Metadata: Mining Special Collections Catalogs for New KnowledgeBig Metadata: Mining Special Collections Catalogs for New Knowledge
Big Metadata: Mining Special Collections Catalogs for New Knowledge
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 

More from Ian White

Everything about Data for SV2B in Vilnius, Lithuania
Everything about Data for SV2B in Vilnius, LithuaniaEverything about Data for SV2B in Vilnius, Lithuania
Everything about Data for SV2B in Vilnius, Lithuania
Ian White
 
Tableau Customer Conference - Geographic Analysis
Tableau Customer Conference - Geographic AnalysisTableau Customer Conference - Geographic Analysis
Tableau Customer Conference - Geographic Analysis
Ian White
 

More from Ian White (8)

Everything about Data for SV2B in Vilnius, Lithuania
Everything about Data for SV2B in Vilnius, LithuaniaEverything about Data for SV2B in Vilnius, Lithuania
Everything about Data for SV2B in Vilnius, Lithuania
 
Departmental Seminar: Innovation
Departmental Seminar: InnovationDepartmental Seminar: Innovation
Departmental Seminar: Innovation
 
Tableau Customer Conference - Geographic Analysis
Tableau Customer Conference - Geographic AnalysisTableau Customer Conference - Geographic Analysis
Tableau Customer Conference - Geographic Analysis
 
How Open Is Open (Redux)?
How Open Is Open (Redux)?How Open Is Open (Redux)?
How Open Is Open (Redux)?
 
Geotrends For 2011 And Beyond
Geotrends For 2011 And BeyondGeotrends For 2011 And Beyond
Geotrends For 2011 And Beyond
 
Dark Side Of Data
Dark Side Of DataDark Side Of Data
Dark Side Of Data
 
How Open Is Open?
How Open Is Open?How Open Is Open?
How Open Is Open?
 
Location Doesn\'t Matter
Location Doesn\'t MatterLocation Doesn\'t Matter
Location Doesn\'t Matter
 

Recently uploaded

ch 2 asset classes and financial instrument.ppt
ch 2 asset classes and financial instrument.pptch 2 asset classes and financial instrument.ppt
ch 2 asset classes and financial instrument.ppt
ZawadAmin2
 
zidauu _business communication.pptx /pdf
zidauu _business  communication.pptx /pdfzidauu _business  communication.pptx /pdf
zidauu _business communication.pptx /pdf
zukhrafshabbir
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Dubai Multi Commodity Centre
 

Recently uploaded (20)

Raising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE VenturesRaising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE Ventures
 
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
Unleash Data Power with EnFuse Solutions' Comprehensive Data Management Servi...
 
Aspire Time & Life Leadership Workshop 2024
Aspire Time & Life Leadership Workshop 2024Aspire Time & Life Leadership Workshop 2024
Aspire Time & Life Leadership Workshop 2024
 
Your Work Matters to God RestorationChurch.pptx
Your Work Matters to God RestorationChurch.pptxYour Work Matters to God RestorationChurch.pptx
Your Work Matters to God RestorationChurch.pptx
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
 
ch 2 asset classes and financial instrument.ppt
ch 2 asset classes and financial instrument.pptch 2 asset classes and financial instrument.ppt
ch 2 asset classes and financial instrument.ppt
 
Daftar Rumpun, Pohon, dan Cabang Ilmu (2024).pdf
Daftar Rumpun, Pohon, dan Cabang Ilmu (2024).pdfDaftar Rumpun, Pohon, dan Cabang Ilmu (2024).pdf
Daftar Rumpun, Pohon, dan Cabang Ilmu (2024).pdf
 
LinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptxLinkedIn Masterclass Techweek 2024 v4.1.pptx
LinkedIn Masterclass Techweek 2024 v4.1.pptx
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Event Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybridEvent Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybrid
 
zidauu _business communication.pptx /pdf
zidauu _business  communication.pptx /pdfzidauu _business  communication.pptx /pdf
zidauu _business communication.pptx /pdf
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024
 
Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
How to refresh to be fit for the future world
How to refresh to be fit for the future worldHow to refresh to be fit for the future world
How to refresh to be fit for the future world
 
Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)Special Purpose Vehicle (Purpose, Formation & examples)
Special Purpose Vehicle (Purpose, Formation & examples)
 
Creative Ideas for Interactive Team Presentations
Creative Ideas for Interactive Team PresentationsCreative Ideas for Interactive Team Presentations
Creative Ideas for Interactive Team Presentations
 
Engagement Rings vs Promise Rings | Detailed Guide
Engagement Rings vs Promise Rings | Detailed GuideEngagement Rings vs Promise Rings | Detailed Guide
Engagement Rings vs Promise Rings | Detailed Guide
 

Big Data = Bigger Metadata

Editor's Notes

  1. Some background to Urban Mapping. Wasn’t a straight forward path, but it’s very relevant-started close to 10 yrs ago with a printed map that reveals different layers of thematic imagery—streets, subways, neighborhoods, depending on the angle of viewing. We all know what happened to print, so I shifted the business to a new medium-in 2006 or so we collected much of the same data, but now using a spatial database as opposed to regular old vector/adobe illustrator. The writing was on the wall for licensing content to local web publishers, so shifted again-this time we moved upstream—continue to develop our own data, but greatly expand that effort to include commercial data and deliver it through our own mapping service. We do this for customers in various market segments, like Tableau Software, where we perform a few geo-services like hosting the base map and overlaying data.
  2. I can be a bit of a curmudgeon and I hope a cautionary point of view has a place. Let’s talk about what Big Data is not. I’ll talk later about what it is.First thing to note is that Big Data isn’t really about data at all. But I am. It’s about tools and processes to manage and exploit info-nuggets. There’s nothing revolutionary about saying this, but I wanted to make it explicit. Second, big data isn’t especially new– Wall St and Walmart have been processing and deriving value for decades, but they don’t talk about it. Why? Because they make money doing so and don’t need to alert the competition. Anybody hear of Teradata? Whenever companies want to talk about what they are doing, it’s usually a red flag for me, meaning the technology, industry or something else hasn’t sufficiently evolved. But I’m also not saying Big Data is a rehash of enterprise software. More on that later…Finally, Big Data has democratized access to powerful tools at little cost. This doesn’t necessarily mean everybody knows how to use these tools. There can be some blowback, such as high credit card bills, analysis without direction/objective and lack of knowledge about basic statistics
  3. There’s been exponential growth in data and it comes from any number of places. Some are shown here—mobile devices as probes, which vast capabilities to record all kinds of environmental variables, open government, social media and a desire for analytics which has been rebranded as business intelligence,
  4. Processing and storage costs drop like rocks—enterprise software has been offering big solutions for decades to banking and others, but with incredibly low barriers to entry virtually anybody can participate.
  5. Kal-i-um-akuswas a noted poet in the Library of Alexandria in 3rd century BC.
  6. He created pin-a-keez, or Lists, a way of organizing works in the libraryEmbarked on the effort to organize 120k scrolls, by title, author, birthplace, father, education, summary of contents and other info. This was first effort to systematically create a bibliographic system. A direct link to metadata 2 millennia later
  7. 1595, Johan van der Does publishedNomenclator– this was the first instance of a printed catalog of library holdings. Represented a significant advancement over the Kal-i-um-akuslists, but it too close to two millennia to get here
  8. The modern cataloging system: Dewey Decimal System, created 1876. Its father was Melville DeweyThe Dewey Decimal System attempted to organize all knowledge into ten main classes. Further subdivided into ten divisions, and each division into ten sections, giving ten main classes, 100 divisions and 1000 sections. Allows for infinite hierarchy, numerical and faceted (linking content from different areas).Other systems followed: Universal Decimal Classification, Library of Congress, etc…
  9. This photo is from the Card Division at the Library of Congress in the1920s. The amount of physical metadata is astounding. Millions of library cards with metadata
  10. The next major advancement was in the late 1960s. Early attempts at electronic indexing focused on a taxonomy of keywords and related information. Was efficient for reporting on what the system contained, but also kept the long running divorce between artifact and metadataThe online computer library center was created as a nonprofit to further access to library resources across institutions and decrease costs.The OCLC acquired the Dewey Decimal System and as any standards body does, sought to perpetuate its existence over the decadesThen the internet happened
  11. That meant out wit the old, In with the new. This photo is library cards going into storage. Not sure why they’d even be archived after the transition to databases was made, but that’s for another time
  12. So this is the situation. Beginning in the late 60s, electronically-stored metadata began to grow. The library cards (at left) went away, but the bifurcation was complete. Total separation of the thing from the description of the thing. And it sort of made sense– IT was in its infancy, so storage and processing costs were high. Publishers also exerted a great deal of control over how they permitted libraries to index and make available works.
  13. To put the last 2000 years in perspective, Kal-i-um-akus created the first crude schema, leaving a place for metadata to be storedThe Nomenclator gave us the first bibliographic catalog, printed and bound, produced annuallyThe Dewey Decimal System was born in 1876 and was the basis of an extensive metadata system for published worksThen…the internet happened. In the top right you see the corner of a cloud. That’s my way of representing what happens next.The volume of data product grows exponentially, overtaking 2000 plus years of history in no time.
  14. So how about the bifurcation/divorce I mentioned? The web brought the artifact and metadata together again
  15. Google Books. Sure, we have the Dewey Decimal type stuff along with ISBN, retail price, etc…but we also threw in the whole damn book—full text search.Amazon does it too
  16. In my industry, the state of metadata is horrendous. We’re stuck in the green screen days. Proprietary data formats and slow moving vendors don’t help.While I’m the first person to admit GIS needs to get off its ass and change, radically, there’s also something the real time streaming web can learn from us.
  17. We hear about the rise of the curator, the part social scientist, part librarian, part RDBMS wiz and statistician.This is increasingly important across all industries—when dealing with a torrent of data, domain experts will be required to help make sense of it.
  18. The Knowledge Hierarchy, as it is sometimes known, has been used to represent relationships between the stuff that turns into something meaningful. You could look at this going from a letter to a sentence to a paragraph or an ingredient to a recipe to a meal or something else. The details don’t matter here, but I think about the fundamental building block of data.One geocoded tweet has little or no value on its own. Contrast that with per capital income for this ZIP code. By amassing enough geocoded tweets, it’s clear we can get to something meaningful, but I don’t know how many tweets that is. I do know that per capita income can directly inform my marketing plans for selling a new shampoo.
  19. With that, here’s some more wet blanket for everybody. Using Google Trends, I looked at a number of terms that might indicate the old fashioned RDBMS, SQL way of life and most seem to follow the blue line, which represents the term ‘metadata.’ Big Data, coincidentally, first appears a few months before the first Strata conference in 2011. ‘Curation’ has a longer life but doesn’t show the surge of Big Data, and everybody’s favorite ‘data scientist,’ doesn’t register as much more than a rounding error. I’m not using Google Trends to fully substantiate my argument, but I do hope you take a dose of skepticism before fully embracing ‘this.’
  20. In close, I’d like to leave you with an emergent cliché. It’s also my measure of how geeky an audience I have: one person’s metadata is another person’s data.