NYC DataWeb
                A platform for Integrating Public Data into NYC.gov




                                     Joel Natividad
Click here for narrated version           TCG
                                  Thursday, June 9, 2011
                                     SemTech 2011
About Me

•   TCG Software

    •   Software Services arm of “The Chatterjee Group”

    •   Several Portfolio companies in Lifesciences, Telecom,
        Aviation, Energy, Real Estate, & Info Technology

•   Headquartered in NYC

•   Delivery Centers in Bangalore, Kolkata & Mumbai

•   Look after Knowledge Engineering Practice of TCG
Background
Main Goals
•   stimulate development of apps
    that improve access to info
    and govt transparency,
    and;


•   encourage innovation & the
    creation of new IP with
    commercial potential
CROWDSOURCING
CROWDSOURCING

 • Wisdom of the Crowd
 • Self-selecting, motivated developers
 • Bang for the Buck
 • Ignites Entrepreneurship
CROWDSOURCING

•   Challenge:
    Improve Recommendation Algorithm
    by 10%

• Dataset:
                                                      STATISTICS
 • 100 million ratings (training set)   •       just 6 days into contest,
 • Half a million Users                         Cinematch bested by 1%


 • 18 thousand movies                   •       20,000 Teams, 150 countries

                                        •       Entrants:
• Prize:                                    •     Bell Labs
    One million US Dollars
                                            •     Opera Solutions

                                            •     Well-renowned universities
CROWDSOURCING

•   Challenge:
    Improve Recommendation Algorithm
    by 10%

• Dataset:
                                                      STATISTICS
 • 100 million ratings (training set)   •       just 6 days into contest,
 • Half a million Users                         Cinematch bested by 1%


 • 18 thousand movies                   •       20,000 Teams, 150 countries

                                        •       Entrants:
• Prize:                                    •     Bell Labs
    One million US Dollars
                                            •     Opera Solutions

                                            •     Well-renowned universities
CROWDSOURCING
• Washington DC CTO - Vivek Kundra
•   First Federal CIO - Vivek Kundra
•   First Federal CIO - Vivek Kundra

•   Open Government Initiative

    •   Recovery.gov

    •   Data.gov

    •   USAspending.gov

    •   IT Dashboard

    •   Performance.gov

    •   Fedspace

    •   Citizen Services Dashboard
•   First Federal CIO - Vivek Kundra

•   Open Government Initiative

    •   Recovery.gov

    •   Data.gov

    •   USAspending.gov

    •   IT Dashboard

    •   Performance.gov

    •   Fedspace

    •   Citizen Services Dashboard
•   First Federal CIO - Vivek Kundra

•   Open Government Initiative

    •   Recovery.gov

    •   Data.gov

    •   USAspending.gov

    •   IT Dashboard

    •   Performance.gov

    •   Fedspace

    •   Citizen Services Dashboard
•   First Federal CIO - Vivek Kundra

•   Open Government Initiative

    •   Recovery.gov

    •   Data.gov

    •   USAspending.gov

    •   IT Dashboard

    •   Performance.gov

    •   Fedspace

    •   Citizen Services Dashboard
•   First Federal CIO - Vivek Kundra

•   Open Government Initiative

    •   Recovery.gov




                          }
    •   Data.gov                     Li fe
                                 S u pp o r
                                            t
    •   USAspending.gov

    •   IT Dashboard

    •   Performance.gov

    •   Fedspace

    •   Citizen Services Dashboard
•   First Federal CIO - Vivek Kundra

           •   Open Government Initiative

               •
                  sh   ed
                   Recovery.gov




                                     }
         e t• sla           o u Li fe
                           t S pp
  B u dg          i lli on
                   Data.gov

            • m
                                   ort
       $ 34 o n    USAspending.gov

fr o m •m i l l i
       $8
                   IT Dashboard

               •   Performance.gov

               •   Fedspace

               •   Citizen Services Dashboard
Open Data in NYC




Council Member Gale Brewer
$ 500 m i l l i o n ! ! !
Wh y $ 500
m i l l i o n? ! ? !
Wh y $ 500
m i l l i o n? ! ? !
“Integrated”
Inter-Agency System
Data Integration Alphabet Soup

       JMS         SOA              XS
                                      LT
M OM         EAI




                                B
                           OR
 EJB     SOAP       D A             XML
                   M
                          RPC
       BPM                      PO JO
                   BPEL
Data Integration Alphabet Soup
        JMS       SOA
                             XS
                               LT
   M
       EAI


MO




                             ORB
EJ




                               XM L
    B
    SO
        AP




    BPM       MDA BPEL RPC     PO JO
and
              Principles              b io ni
                                                ch




•   Cost Effective (NOT $500 million dollars)

•   Easy to Use (Developers/Publishers/Citizens)

•   based on Open Standards

•   Low Adoption Curve

•   Help Accelerate Open Data Innovation

•   Useable Data Now!
The Next Web of Open Linked Data
         February 2009
Useable Data Now

•   “Beautiful” Website

•   Useable by Developers/Publishers/Citizens

•   based on Open Standards

•   Low Adoption Curve

•   Help Accelerate Open Data Innovation

•   Useable Data Now!
What	
  NYCBigApps	
  Developers	
  
                                    were	
  Doing


                                              Download &
                                              Decipher


                 ETL             Text
              Processes


Siloed Data
                             •   Spend inordinate amount of time interpreting data

                             •   Massaged Data was then staged locally

                             •   Developers kept reinventing the wheel

                             •   Limited Data mashups

                             •   Applications disconnected from NYCDatamine
                                                                               46
There must be a
  Better Way
How it Started

•   Oct 12, 2010 - NYCBigApps 2.0 announced

•   Nov 9, 2010 - NYCBigApps 2.0 kickoff meeting

•   late Nov 2010 - spoke with Revelytix/Spry about
    collaborating

•   early Dec 2010 - started work on NYCDataWeb

•   Jan 26, 2011 ~4:30p - submitted entry
What	
  We	
  Did


                            Domain
                            Ontology
                                                      Query &
                                                      Results



                                                                 Cache       Optimizer
              Definitions
                                                                 Re-Writer   Planner
Siloed Data
                                                                 Indexes     Rules




                                       Re-Writer    Optimizer   Mapping
                                                                Ontology
                                       Indexes      Planner                  Rules

                                                                Metadata
                                                                Ontology
                                                                                       51
“Beautiful” Website
       Three dashboards were built
• NYC Agile Analytics (Spry)
• NYCreation (SMW+)
  - visualized SPARQL query results
• NYCmantics (SMW+)
  - NYC datamine explorer
What’s Next?
Semantic Gap
Developers




Semantic Gap
?!?



Semantic Gap
3.0
3.0
 Developers
3.0




JumpStart Semantics
3.0
The Computer for the 
          rest of us.
Semantics for the 
      rest of us.
Semantics for the 
   REST of us.
Phase 2
         Aug 2011 (Powered by NYCDataWeb)

•   Hide Complexity               •   Open-source
    (Simplicity = Adoption)           collaboration with
                                      vendors & other
•   Incorporate the whole             institutions
    NYC datamine
                                  •   Incorporate the best of
•   Make it easier for                Socrata and data.gov
    Publishers
                                  •   Improved Visualizations
•   Make it easier for
    Developers

•   Make it easier for Citizens
Phase 2
         Aug 2011 (Powered by NYCDataWeb)

•   Hide Complexity               •   Open-source
    (Simplicity = Adoption)           collaboration with
                                      vendors & other
•   Incorporate the whole             institutions
    NYC datamine
                                  •   Incorporate the best of
•   Make it easier for                Socrata and data.gov
    Publishers
                                  •   Improved Visualizations
•   Make it easier for
    Developers                    •   Position NYCDataWeb as
                                      the accelerated data
•   Make it easier for Citizens       mashup platform
Phase 3
            Nov 2011 (NYCBigApps 2011)


•   DataWeb Deployment Framework SMW bundle

•   More Data Sources (Federator - Spinner)

•   Linked Open Data

•   Make it easier STILL for Publishers, Developers
    and Citizens

•   Enable Widespread adoption of NYCDataWeb
    (NYCDataWeb bootcamp)
The	
  Broader	
  Vision


                                    Domain
                                    Ontology
                                                         Query &
                                                         Results


                                                             RDF
                                                                          Ontology
                         NYC
                     Information
                         Web
                                                                                        Partners
                                        RDF RDF
                                                                   RDF


                                                   RDF       RDF


                                    Web
                                   Pages
                                                                            Other
Agency	
  Data	
                                  Sensorss               Triplestores          85
Phase 4
                Post NYC BigApps 2011




•   Multiple solutions powered by NYCDataWeb

•   <Your city/community/company here> DataWeb

•   Help foster a viable ecosystem of Linked Data

•   ... keep standing on the shoulders of giants
Semantic
Web
Hans Rosling shows the best stats
       you've ever seen
           February 2006
PUBLIC
PUBLIC
We need your help & feedback




  A Platform for Integrating Public Data into NYC.gov

                 Find out more at
  http://knoodl.com/ui/groups/NYC_Homepage
CREDITS
•   Lego Faceparty picture by RichardAM (http://www.richard-am.net/)
•   Lego Inauguration Pictures from various Flickr Users (sluggobear, Atwater, Dan
    Hontz)
•   Lego Luke looses his Hand by Flickr user wwwayazdotcom
•   Tim Berners-Lee highlight from TED (http://www.ted.com/talks/
    tim_berners_lee_on_the_next_web.html)
•   Hans Rosling highlight from TED (http://www.ted.com/talks/
    hans_rosling_shows_the_best_stats_you_ve_ever_seen.html)
•   FlowerPowerpont2.pptx provided by Anna Rosling Rönnlund of gapminder
•   “Star Wars Gangsta Rap” highlight, SizzlechestXXX
    (http://www.youtube.com/watch?v=Ij4w7ChpuaM)
•   Various screenshots provided by Revelytix, Spry Inc. and TCG Software
    Services

NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC

  • 1.
    NYC DataWeb A platform for Integrating Public Data into NYC.gov Joel Natividad Click here for narrated version TCG Thursday, June 9, 2011 SemTech 2011
  • 2.
    About Me • TCG Software • Software Services arm of “The Chatterjee Group” • Several Portfolio companies in Lifesciences, Telecom, Aviation, Energy, Real Estate, & Info Technology • Headquartered in NYC • Delivery Centers in Bangalore, Kolkata & Mumbai • Look after Knowledge Engineering Practice of TCG
  • 3.
  • 6.
    Main Goals • stimulate development of apps that improve access to info and govt transparency, and; • encourage innovation & the creation of new IP with commercial potential
  • 9.
  • 10.
    CROWDSOURCING • Wisdomof the Crowd • Self-selecting, motivated developers • Bang for the Buck • Ignites Entrepreneurship
  • 11.
    CROWDSOURCING • Challenge: Improve Recommendation Algorithm by 10% • Dataset: STATISTICS • 100 million ratings (training set) • just 6 days into contest, • Half a million Users Cinematch bested by 1% • 18 thousand movies • 20,000 Teams, 150 countries • Entrants: • Prize: • Bell Labs One million US Dollars • Opera Solutions • Well-renowned universities
  • 12.
    CROWDSOURCING • Challenge: Improve Recommendation Algorithm by 10% • Dataset: STATISTICS • 100 million ratings (training set) • just 6 days into contest, • Half a million Users Cinematch bested by 1% • 18 thousand movies • 20,000 Teams, 150 countries • Entrants: • Prize: • Bell Labs One million US Dollars • Opera Solutions • Well-renowned universities
  • 13.
  • 19.
    • Washington DCCTO - Vivek Kundra
  • 20.
    First Federal CIO - Vivek Kundra
  • 21.
    First Federal CIO - Vivek Kundra • Open Government Initiative • Recovery.gov • Data.gov • USAspending.gov • IT Dashboard • Performance.gov • Fedspace • Citizen Services Dashboard
  • 22.
    First Federal CIO - Vivek Kundra • Open Government Initiative • Recovery.gov • Data.gov • USAspending.gov • IT Dashboard • Performance.gov • Fedspace • Citizen Services Dashboard
  • 23.
    First Federal CIO - Vivek Kundra • Open Government Initiative • Recovery.gov • Data.gov • USAspending.gov • IT Dashboard • Performance.gov • Fedspace • Citizen Services Dashboard
  • 24.
    First Federal CIO - Vivek Kundra • Open Government Initiative • Recovery.gov • Data.gov • USAspending.gov • IT Dashboard • Performance.gov • Fedspace • Citizen Services Dashboard
  • 25.
    First Federal CIO - Vivek Kundra • Open Government Initiative • Recovery.gov } • Data.gov Li fe S u pp o r t • USAspending.gov • IT Dashboard • Performance.gov • Fedspace • Citizen Services Dashboard
  • 26.
    First Federal CIO - Vivek Kundra • Open Government Initiative • sh ed Recovery.gov } e t• sla o u Li fe t S pp B u dg i lli on Data.gov • m ort $ 34 o n USAspending.gov fr o m •m i l l i $8 IT Dashboard • Performance.gov • Fedspace • Citizen Services Dashboard
  • 29.
    Open Data inNYC Council Member Gale Brewer
  • 35.
    $ 500 mi l l i o n ! ! !
  • 39.
    Wh y $500 m i l l i o n? ! ? !
  • 40.
    Wh y $500 m i l l i o n? ! ? !
  • 48.
  • 49.
    Data Integration AlphabetSoup JMS SOA XS LT M OM EAI B OR EJB SOAP D A XML M RPC BPM PO JO BPEL
  • 50.
    Data Integration AlphabetSoup JMS SOA XS LT M EAI MO ORB EJ XM L B SO AP BPM MDA BPEL RPC PO JO
  • 52.
    and Principles b io ni ch • Cost Effective (NOT $500 million dollars) • Easy to Use (Developers/Publishers/Citizens) • based on Open Standards • Low Adoption Curve • Help Accelerate Open Data Innovation • Useable Data Now!
  • 53.
    The Next Webof Open Linked Data February 2009
  • 54.
    Useable Data Now • “Beautiful” Website • Useable by Developers/Publishers/Citizens • based on Open Standards • Low Adoption Curve • Help Accelerate Open Data Innovation • Useable Data Now!
  • 55.
    What  NYCBigApps  Developers   were  Doing Download & Decipher ETL Text Processes Siloed Data • Spend inordinate amount of time interpreting data • Massaged Data was then staged locally • Developers kept reinventing the wheel • Limited Data mashups • Applications disconnected from NYCDatamine 46
  • 56.
    There must bea Better Way
  • 57.
    How it Started • Oct 12, 2010 - NYCBigApps 2.0 announced • Nov 9, 2010 - NYCBigApps 2.0 kickoff meeting • late Nov 2010 - spoke with Revelytix/Spry about collaborating • early Dec 2010 - started work on NYCDataWeb • Jan 26, 2011 ~4:30p - submitted entry
  • 60.
    What  We  Did Domain Ontology Query & Results Cache Optimizer Definitions Re-Writer Planner Siloed Data Indexes Rules Re-Writer Optimizer Mapping Ontology Indexes Planner Rules Metadata Ontology 51
  • 61.
    “Beautiful” Website Three dashboards were built • NYC Agile Analytics (Spry) • NYCreation (SMW+) - visualized SPARQL query results • NYCmantics (SMW+) - NYC datamine explorer
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 92.
    The Computer forthe  rest of us.
  • 93.
  • 94.
  • 95.
    Phase 2 Aug 2011 (Powered by NYCDataWeb) • Hide Complexity • Open-source (Simplicity = Adoption) collaboration with vendors & other • Incorporate the whole institutions NYC datamine • Incorporate the best of • Make it easier for Socrata and data.gov Publishers • Improved Visualizations • Make it easier for Developers • Make it easier for Citizens
  • 96.
    Phase 2 Aug 2011 (Powered by NYCDataWeb) • Hide Complexity • Open-source (Simplicity = Adoption) collaboration with vendors & other • Incorporate the whole institutions NYC datamine • Incorporate the best of • Make it easier for Socrata and data.gov Publishers • Improved Visualizations • Make it easier for Developers • Position NYCDataWeb as the accelerated data • Make it easier for Citizens mashup platform
  • 97.
    Phase 3 Nov 2011 (NYCBigApps 2011) • DataWeb Deployment Framework SMW bundle • More Data Sources (Federator - Spinner) • Linked Open Data • Make it easier STILL for Publishers, Developers and Citizens • Enable Widespread adoption of NYCDataWeb (NYCDataWeb bootcamp)
  • 98.
    The  Broader  Vision Domain Ontology Query & Results RDF Ontology NYC Information Web Partners RDF RDF RDF RDF RDF Web Pages Other Agency  Data   Sensorss Triplestores 85
  • 99.
    Phase 4 Post NYC BigApps 2011 • Multiple solutions powered by NYCDataWeb • <Your city/community/company here> DataWeb • Help foster a viable ecosystem of Linked Data • ... keep standing on the shoulders of giants
  • 100.
  • 101.
    Hans Rosling showsthe best stats you've ever seen February 2006
  • 103.
  • 104.
  • 106.
    We need yourhelp & feedback A Platform for Integrating Public Data into NYC.gov Find out more at http://knoodl.com/ui/groups/NYC_Homepage
  • 108.
    CREDITS • Lego Faceparty picture by RichardAM (http://www.richard-am.net/) • Lego Inauguration Pictures from various Flickr Users (sluggobear, Atwater, Dan Hontz) • Lego Luke looses his Hand by Flickr user wwwayazdotcom • Tim Berners-Lee highlight from TED (http://www.ted.com/talks/ tim_berners_lee_on_the_next_web.html) • Hans Rosling highlight from TED (http://www.ted.com/talks/ hans_rosling_shows_the_best_stats_you_ve_ever_seen.html) • FlowerPowerpont2.pptx provided by Anna Rosling Rönnlund of gapminder • “Star Wars Gangsta Rap” highlight, SizzlechestXXX (http://www.youtube.com/watch?v=Ij4w7ChpuaM) • Various screenshots provided by Revelytix, Spry Inc. and TCG Software Services