Department of Commerce App Challenge: Big Data Dashboards
Upcoming SlideShare
Loading in...5
×
 

Department of Commerce App Challenge: Big Data Dashboards

on

  • 1,846 views

International Open Government Data Conference: Virtual Conference

International Open Government Data Conference: Virtual Conference
Best Practices From Around the World in Putting Data to Work

Statistics

Views

Total Views
1,846
Views on SlideShare
1,842
Embed Views
4

Actions

Likes
3
Downloads
13
Comments
1

2 Embeds 4

http://www.linkedin.com 3
https://twimg0-a.akamaihd.net 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Congrats to Brand Niemann on having this presentation trending at the top of SlideShare's LinkedIn page today!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Department of Commerce App Challenge: Big Data Dashboards Department of Commerce App Challenge: Big Data Dashboards Presentation Transcript

  • Department of Commerce App Challenge: Big Data Dashboards International Open Government Data Conference: Virtual Conference Best Practices From Around the World in Putting Data to Work Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 27, 2012. Updated April 30, 2012. Updated July 7, 2012.http://semanticommunity.info/AOL_Government/2012_International_Open_Government_Data_Conferencehttp://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge 1
  • International Open Government Data Conference: Virtual Conference• Questions to ask each presenter to supply afterwards for a directory - are you doing these things? – The way to document the public benefits with Open Data is to be able to answer the points below:• OPEN DATA – O: Not previously Open to the public (lots of the "Open data" has already been available and is just being re-advertised) – P: Serves a Purpose (there is a reason the data was collected that clearly serves a real purpose - e.g. Congressional redistricting) – E: Educates citizens and politicians to take action (results that provide a valid basis for action) – N: Made Newsworthy by journalists (results are communicated objectively and effectively) – D: The plural of Dataum - something given or admitted especially as a basis for reasoning or inference – A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with – T: Transparent (see where the data came from, how it was analyzed, where the results came from, etc.) – A: Answers questions posed by the above 2
  • Open Data Example• O: Not previously Open to the public (lots of the "Open data" has already been available and is just being re-advertised) – EPA Envirofacts Warehouse APIs (slow large queries and bulk downloads before)• P: Serves a Purpose (there is a reason the data was collected that clearly serves a real purpose - e.g. Congressional redistricting) – EPA Envirofacts data is Congressionally mandated for protection of human health and welfare• E: Educates citizens and politicians to take action (results that provide a valid basis for action) – EPA Envirofacts Web Site (over 2500 Web pages)• N: Made Newsworthy by journalists (results are communicated objectively and effectively) – My AOL Government Story is one of many such efforts• D: The plural of Dataum - something given or admitted especially as a basis for reasoning or inference – EPA has data standards and quality assurance methods for these data• A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with – Yes• T: Transparent (see where the data came from, how it was analyzed, where the results came from, etc.) – Yes, metadata is provided and combined with the new data APIs• A: Answers questions posed by the above – See my AOL Government Story with summary results as one of many such efforts 3
  • Beautiful Spreadsheet Data for EPA Envirofacts Warehouse Metadata and API Dashboard• Built for my former EPA CIO, Malcolm Jackson (a mobile app - iPad)• Always wanted to do since my early days in the EPA Data Standards Branch (2000-2002)• Built a beautiful spreadsheet for public use and Spotfire application• The format is both linked metadata and linked data• Search all the metadata and get API data (but for only 9 of 13 systems and for only 5000 rows at a time)• Find key fields for data integration and build many apps• Metadata results: – Models: 15 – Tables: 227 – Rows: 2518 – Types: 40 – Columns (Data Elements): 1662 4
  • Beautiful Spreadsheet Data for EPA Envirofacts Warehouse Metadata and API Dashboard Web Player 5
  • Data Science Analytics for 2012 IOGDC“More data beats clever algorithms butbetter data beats more data.” Monica • IOGDC ConferenceRogati @ Strata 2012 Knowledge Bases • IOGDS Catalog Data Sets • IOGDS Data Analytics with BI Tools – Exploiting Linked Data with Business Intelligence Tools • Acknowledgement: Kingsley Idehen, CEO, OpenLink Software 6
  • Data Science Analytics for 2012 IOGDC 2012 IOGDC Knowledge Bases Web Player 7
  • Data Science Analytics for 2012 IOGDC IOGDS Catalog Data Sets Web Player 8
  • Data Science Analytics for 2012 IOGDC IOGDS Data Analytics with BI Tools Web Player 9
  • An Information Platform• An Information Platform is the critical infrastructure component for building a Learning Organization. The most critical human component for accelerating the learning process and making use of the Information Platform is taking the shape of a new role: the Data Scientist. – Jeff Hammerbacher, in Chapter 5: Information Platforms and the Rise of the Data Scientist in the His Book “Beautiful Data” (July 2009) (see Linked Data reference below) http://semanticommunity.info/AOL_Government/Beautiful_Data#Information_Platforms_As_Dataspaces 10
  • Jeff Hammerbacker• The number two data scientist in the world, according to Tim O’Reilly, is Jeff Hammerbacker, who built the data science team at Facebook and is now at Cloudera, driving the success of Hadoop as a standard tool for processing large, unstructured data sets with a network of commodity computers. Jeff also teaches ”Introduction to Data Science”, at UC Berkeley, and in his opening lecture organizes reasons for doing so into three parts as follows: – 1. Personal - Jeffs training and job experiences – 2. Putting Data to Work - Theme of the 2012 International Open Government Data Conference – 3. The Emergence of Data Science - Dominate theme of future conferences according to Robert Ames, Senior VP for Technology at In- Q-Tel, at the FCW Executive Briefing on Big Data and the Government Enterprise, June 21, 2012 http://www.forbes.com/pictures/lmm45emkh/tim-oreilly-is-the-founder-of-oreily-media/#gallerycontent 11
  • My Mission Statement• 1. Personal: – Senior Data Scientist at the US EPA: • Completed Data Science Academic Training and Many EPA Data Products – Detail to Data.gov: • Built Data.gov in An Information Platform• 2. Putting Data To Work: – Data Journalist for Federal Computer Week and AOL Government: • Published Many Data Science Products and Built Own Data Journalism Handbook – Data as a First Class Citizen: Data Science and Journalism for Analytic Standards and Audit of Open Data Sites: • Working with CKAN, DoD, IC, NCOIC, NIST, OASIS, OMG, OSTP, W3C, etc.• 3. The Emergence of Data Science: – Built a Data Science Team for the Government Community: • “Killer Semantic Web Application” (Semantic MedLine on the new Cray Graph Computer) for the Federal Big Data Senior Steering Group – Challenges and Contests Using the Best High Quality Data Sets: • Heritage Provider Network Health Prize, Health Data Initiative Forums, TedMed, Department of Commerce App Challenge, etc. 12
  • Data Scientist• A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.• The title data scientist is sometimes disparaged because it lacks specificity and can be perceived as an aggrandized synonym for data analyst. Regardless, the position is gaining acceptance with large enterprises who are interested in deriving meaning from big data, the voluminous amount of structured, unstructured and semi-structured data that a large enterprise produces.• A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. Perhaps the most important skill a data scientist possesses, however, is the ability to explain the significance of data in a way that can be easily understood by others. Source: http://searchbusinessanalytics.techtarget.com/definition/Data-scientist 13
  • Dr. Brand Niemann• Former Senior Enterprise Architect and Data Scientist, US Environmental Protection Agency (1980-2010).• Current Husband, Father, and Grandfather Enjoying the Golden Years! 14
  • Semantic Community• Our Mantra is: Data Science Precedes the Use of SOA, Cloud, and Semantic Technologies! We use data science to help marketing and business development efforts.• Our Mission is like Googles: Organize the world’s information and make it universally accessible and useful.• Our Method is like Be Informed 4: Architectural Diagrams and Questions and Answers are not enough, you need Dynamic Case Management!• Our Sound Byte: It is not just where you put your data (cloud), but how you put it there!• Our Work: Semantically enhancing your data and writing data science stories about it. 15
  • Introduction• I heard about this several months ago, but put it off until yesterday. I finished it today because I am a very good Data Scientist!• Well I almost finished it. I need the Patent data in a format that I can more readily work with and I am in communication with the USPTO about that.• I create Knowledge Bases about my Data Science work so others can follow what I do and even reproduce it themselves. My apps also work on mobile devices like iPads.• My goal was, and still is, to create a set of multiple interactive dashboards of DoC data like they have for Foreign Trade. 16
  • Data Science Knowledge Basehttp://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge 17
  • Data Science Spreadsheethttp://semanticommunity.info/@api/deki/files/17946/=DoCApp.xlsx 18
  • Spotfire Dashboards• U.S. Census Bureau Geographic Names Information System• U.S. International Trade in Goods and Services• Data.Gov Data Catalog for US Department of Commerce• U.S. Bureau of Economic Analysis• U.S. Patent & Trademark Office 19
  • U.S. Census Bureau Geographic Names Information System Web Player 20
  • U.S. International Trade in Goods and Services Web Player 21
  • Data.Gov Data Catalog for US Department of Commerce Web Player 22
  • U.S. Bureau of Economic Analysis Web Player 23
  • U.S. Patent & Trademark Office• Methodology: – Overview: Apply Galls Law and start with the end in mind (Mashups and Decision Support) and work out the details in a simple and small content example for my next AOL Government Story! Give everything a well-defined URL for a semantically enhanced index in a Dashboard (see next slide). • 1. Follow Galls Law which says: "A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: a complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a simple system." - John Gall, systems theorist • 2. Copy to MindTouch and add structure to the Web Pages – See http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Chall enge/DOC_USPTO_Apps_for_Innovation • 3. Look at one ZIP file under each section and subsection to see what it contains and how to use it in MindTouch (in process) – See http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Chall enge/DOC_USPTO_Apps_for_Innovation/Electronic_Data_Products 24
  • U.S. Patent & Trademark Office Web Player 25
  • MindTouch DoC USPTO Apps for Innovationhttp://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation 26
  • MindTouch Electronic Data Productshttp://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation/Electronic_Data_Products 27
  • Work Plan in Process • Mash-Ups: – Combine USPTO applicant/inventor information with other USPTO datasets (e.g., with USPTO assignments (ownership) data): • Google or USPTO Daily and USPTO Retro – Combine USPTO patent grants and patent application publications with other DOC data (e.g., Census or Economic data) • Innovative Ideas: – Homogenize the patent grant bibliographic text data (i.e., make it all the same format). – Same for the patent application publication bibliographic data. – Capture patent grant bibliographic text data from 1790 to 1975 using the image data. – Build a text searchable database (updated weekly) that includes both of the datasets discussed in the Webinar. Search queries can be saved. Result sets can be saved/extracted/tailored. – Build a text searchable database (updated weekly) that includes subsets of both of the datasets discussed in the Webinar. (e.g., Green Technology related). – Same ideas as above, but use full-text (75 MB/104 MB per week) or full-text with embedded images (1.4 GB/1.5GB per week): http://www.google.com/googlebooks/uspto-patents.htmlSource: http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation#Innovative_Ideas 28
  • More Questions For Todd Park About Big Datahttp://gov.aol.com/2012/04/25/more-questions-for-todd-park-about-big-data/ 29
  • Conclusions and Recommendations • A Data Science approach to the App Challenge provided examples for improvements in data dissemination and visualization. • Most of the data sets are “big data” when it comes to the app developer community working on simple mobile apps using smaller data sets. • The Patent data dissemination offers the most challenge for improvement and opportunity for creative piloting using a Data Science approach.For details see: http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge#Submission 30
  • Postscript• Presentation to Federal Big Data Senior Steering Group for Big Data, September 27, 2012: – A Data Science team comprised of NLM (Tom Rindflesch), Noblis (Victor Pollara), Cray (Steve Reinhardt), and Semantic Community (Brand Niemann), is working to make what Dr. George Strawn refers to as “the killer semantic web application for government”, Semantic Medline, more well-know, and functional for medical research by putting the Semantic Medline RDF database into the new Cray Graph Computer and demonstrating its usefulness. – The background for this project is at: • http://semanticommunity.info/A_NITRD_Dashboard/Semantic_M edline 31
  • BusinessUSA.gov Their APIs Can be Data Interfaceshttp://gov.aol.com/2012/07/02/why-apis-arent-enough-to-make-businessusa-gov-useful/http://semanticommunity.info/AOL_Government/BusinessUSA.gov_Their_APIs_Can_be_Data_Interfaces 32
  • Imagination at Work! Unleash Your Creativity with Our Census APIhttp://semanticommunity.info/AOL_Government/Data_Services_for_Developers 33
  • Digital Agenda For Europe: Data As First-Class Citizenhttp://gov.aol.com/2012/06/29/digital-agenda-for-europe-data-as-first-class-citizen/http://semanticommunity.info/AOL_Government/Digital_Agenda_for_Europe 34
  • Data Science Spring 2012 Exercise 1: 2012 Presidential Campaign Finance Datahttp://semanticommunity.info/AOL_Government/Beautiful_Data#Spotfire_Dashboard 35
  • Data Science Spring 2012 Exercise 3: Evaluate Models of R Package Recommendationshttp://semanticommunity.info/AOL_Government/Beautiful_Data#Spotfire_Dashboard_2 36
  • Big Data and The Government Enterprise• “More data beats clever algorithms but better data beats more data.” Monica Rogati @ Strata 2012• “Big Data in memory is necessary to avoid loss of information from filtering and aggregation and a data scientist knows the data science and the technology to do that.” Brand Niemann @ Big Data and the Government Enterprise http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise 37
  • Big Data and The Government Enterprise http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise 38