Open Innovation - Winter 2014 - Socrata, Inc.Socrata
As innovators around the world push the open data movement forward, Socrata features their stories, successes, advice, and ideas in our quarterly magazine, “Open Innovation.”
The Winter 2014 issue of Open Innovation is out. This special year-in-review edition contains stories about some of the biggest open data achievements in 2013, as well as expert insights into how open data can grow and where it may go in 2014.
Whitepaper - The need self service data tools, not scientistsJosh Howard
The federal government is one of the organizations most in need of data scientists, but hiring freezes, slashed training budgets and a lack of qualified candidates have all hampered the ability to recruit these types of professionals. Faced with such obstacles, agencies have been developing creative solutions to fill the hiring gap. Learn how to overcome these challenges with big data analytic tools.
Open Innovation - Winter 2014 - Socrata, Inc.Socrata
As innovators around the world push the open data movement forward, Socrata features their stories, successes, advice, and ideas in our quarterly magazine, “Open Innovation.”
The Winter 2014 issue of Open Innovation is out. This special year-in-review edition contains stories about some of the biggest open data achievements in 2013, as well as expert insights into how open data can grow and where it may go in 2014.
Whitepaper - The need self service data tools, not scientistsJosh Howard
The federal government is one of the organizations most in need of data scientists, but hiring freezes, slashed training budgets and a lack of qualified candidates have all hampered the ability to recruit these types of professionals. Faced with such obstacles, agencies have been developing creative solutions to fill the hiring gap. Learn how to overcome these challenges with big data analytic tools.
This white paper: Analyzes the big data revolution and the potential it offers organizations. Explores the critical talent needs and emerging talent gaps related to big data. Offers examples of organizations that are meeting this challenge head on. Recommends four steps HR and talent management professionals can take to bridge the talent gap.
Achieve Federal Open Data Policy Compliance - SlidesSocrata
The November 1, 2013 deadline for compliance with Executive Order 13642 and OMB Memorandum M-13-13 is fast approaching.
Get your questions answered and accelerate your implementation efforts. Attend a free webinar entitled: How to Achieve Open Data Policy Compliance with Socrata.
http://www.socrata.com/webinars/how-to-comply-with-the-federal-open-data-policy/
Data Ingestion Engine Theory is my main investment theory. Since data is the fuel that powers artificial/augmented intelligence, companies that are able to produce value that gets them paid to ingest and analyze valuable data sets at are going to be valuable.
AI is transforming the financial media industry – impacting everything from content creation to consumption trends. Clancy Childs, general manager of Dow Jones’ knowledge enablement unit, will share insights into how Dow Jones is reimagining what the news looks like.
Learn how Dow Jones’ knowledge graph platform – powered by Stardog – enables the company to unify structured and unstructured data from a vast range of news sources and deliver cutting-edge insights for customers and partners globally.
Big data is a term that describes a large or complex
data volume. That data volume can be processes using traditional
data processing software or techniques that are insufficient to deal
with them. But big data is often noisy, heterogeneous, irrelevant
and untrustworthy. As the speed of information growth exceeds
Moore’s Law at the beginning of this new century, excessive data
is making great troubles to human beings. However this data with
special attributes can’t be managed and processed by the current
traditional software system, which become a real problem. In this
paper was discussed some big data challenges and problems that
are faced by organizations. These challenges may relate
heterogeneity, scale, timelines, privacy and human collaboration.
Survey method was used as a theoretical solution framework.
Survey method consists of a questionnaires report. Questionnaires
report consists of all challenges and problems faced by
organizations. After knowing the problem and challenges of
organizations, a solution was given to organization to solve big
data challenges.
In the analogue era information was scarce and came from questionnaires and sampling. Since the dawn of the digital age in 2012 far more data than ever before is stored and it is mainly collected passively, i.e. while people go about doing what they normally do, such as run their businesses, use their cell phones and conduct internet searches.
Analysts, policy makers and business people value business tendency surveys (BTS) and consumer opinion surveys (COS) specifically because the survey results are available before the corresponding (official) quantitative data. However, Big Data has begun to make inroads on areas traditionally covered by BTS and COS. It has a competitive edge over BTS and COS, as it is available in real-time, is based on all observations and does not rely on the active participation of respondents. Furthermore, Big Data has little direct production costs, because it is merely a by-product of business processes. In contrast, putting together and maintaining a sample of active respondents and collecting information through questionnaires as in the case of BTS and COS, require the upkeep of a costly infrastructure and the employment of people with scarce, specialised skills.
However, BTS and COS also have a competitive edge over Big Data in certain aspects. These aspects could broadly be put into two groups, namely 1) BTS and COS offer information that Big Data cannot supply and 2) BTS and COS do not suffer from some of the shortcomings of Big Data. The biggest competitive advantage of BTS and COS is that they measure phenomenon that Big Data does not cover. Big Data records only actual outcomes, while BTS and COS also cover unquantifiable expectations and assessments. Although Big Data often claims that it covers the whole population universe (and not only a selection) this does not necessarily prevent bias. For example, twitter feeds could be biased, because certain demographic or less activist groups are under-represented. In contrast, the research design and random sampling of BTS and COS limit their selection bias.
To remain relevant and survive, producers of BTS and COS will have to adapt and publicise their unique competitive advantage vis-à-vis Big Data in the future. The biggest shift will probably require that producers of BTS and COS make users more aware of the value of the unique forward looking information of BTS and COS (i.e. their recording of expectations about the future).
As 2017 begins, we are seeing big data and data science communities engage with new tools that specifically cater to data scientists and data engineers who aren’t necessarily experts in these techniques. Given rapid technological advances, the question for companies now is how to integrate new data science capabilities into their operations and strategies—and position themselves in a world where analytics can upend entire industries. Leading companies are using their data science capabilities not only to improve their core operations but also to launch entirely new business models.
Tijdens de vierde sessie van de vierdelige reeks Master Minds on Data Science hield Eric van Tol een presentatie over businesscases en verdienmodellen.
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemDATAVERSITY
No (successful) business is an island. For decades, business schools have taught strategies for improving competitiveness by evaluating strengths, weaknesses, opportunities and threats (SWOT), and considering market forces represented by competitors, consumers, and suppliers. Today, enterprises of all sizes are expected to manage their transactions and customer engagement “touch points” using applications that capture and measure everything from materials to customer satisfaction. As we automate and monitor every aspect of manufacturing and distribution (including the production and delivery of intellectual property for service-oriented businesses) there is a significant and growing role for smart data and sensor/IOT data.
Participants in this webinar will learn to define, capture, and analyze new IOT-based data to improve supply-chain performance.
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-MakingCognizant
It’s a new era in business, in which growth will be driven by finding meaning and insights in data. Recent research demonstrates what separates winners from losers and how to rise to the top as a "meaning maker."
This white paper: Analyzes the big data revolution and the potential it offers organizations. Explores the critical talent needs and emerging talent gaps related to big data. Offers examples of organizations that are meeting this challenge head on. Recommends four steps HR and talent management professionals can take to bridge the talent gap.
Achieve Federal Open Data Policy Compliance - SlidesSocrata
The November 1, 2013 deadline for compliance with Executive Order 13642 and OMB Memorandum M-13-13 is fast approaching.
Get your questions answered and accelerate your implementation efforts. Attend a free webinar entitled: How to Achieve Open Data Policy Compliance with Socrata.
http://www.socrata.com/webinars/how-to-comply-with-the-federal-open-data-policy/
Data Ingestion Engine Theory is my main investment theory. Since data is the fuel that powers artificial/augmented intelligence, companies that are able to produce value that gets them paid to ingest and analyze valuable data sets at are going to be valuable.
AI is transforming the financial media industry – impacting everything from content creation to consumption trends. Clancy Childs, general manager of Dow Jones’ knowledge enablement unit, will share insights into how Dow Jones is reimagining what the news looks like.
Learn how Dow Jones’ knowledge graph platform – powered by Stardog – enables the company to unify structured and unstructured data from a vast range of news sources and deliver cutting-edge insights for customers and partners globally.
Big data is a term that describes a large or complex
data volume. That data volume can be processes using traditional
data processing software or techniques that are insufficient to deal
with them. But big data is often noisy, heterogeneous, irrelevant
and untrustworthy. As the speed of information growth exceeds
Moore’s Law at the beginning of this new century, excessive data
is making great troubles to human beings. However this data with
special attributes can’t be managed and processed by the current
traditional software system, which become a real problem. In this
paper was discussed some big data challenges and problems that
are faced by organizations. These challenges may relate
heterogeneity, scale, timelines, privacy and human collaboration.
Survey method was used as a theoretical solution framework.
Survey method consists of a questionnaires report. Questionnaires
report consists of all challenges and problems faced by
organizations. After knowing the problem and challenges of
organizations, a solution was given to organization to solve big
data challenges.
In the analogue era information was scarce and came from questionnaires and sampling. Since the dawn of the digital age in 2012 far more data than ever before is stored and it is mainly collected passively, i.e. while people go about doing what they normally do, such as run their businesses, use their cell phones and conduct internet searches.
Analysts, policy makers and business people value business tendency surveys (BTS) and consumer opinion surveys (COS) specifically because the survey results are available before the corresponding (official) quantitative data. However, Big Data has begun to make inroads on areas traditionally covered by BTS and COS. It has a competitive edge over BTS and COS, as it is available in real-time, is based on all observations and does not rely on the active participation of respondents. Furthermore, Big Data has little direct production costs, because it is merely a by-product of business processes. In contrast, putting together and maintaining a sample of active respondents and collecting information through questionnaires as in the case of BTS and COS, require the upkeep of a costly infrastructure and the employment of people with scarce, specialised skills.
However, BTS and COS also have a competitive edge over Big Data in certain aspects. These aspects could broadly be put into two groups, namely 1) BTS and COS offer information that Big Data cannot supply and 2) BTS and COS do not suffer from some of the shortcomings of Big Data. The biggest competitive advantage of BTS and COS is that they measure phenomenon that Big Data does not cover. Big Data records only actual outcomes, while BTS and COS also cover unquantifiable expectations and assessments. Although Big Data often claims that it covers the whole population universe (and not only a selection) this does not necessarily prevent bias. For example, twitter feeds could be biased, because certain demographic or less activist groups are under-represented. In contrast, the research design and random sampling of BTS and COS limit their selection bias.
To remain relevant and survive, producers of BTS and COS will have to adapt and publicise their unique competitive advantage vis-à-vis Big Data in the future. The biggest shift will probably require that producers of BTS and COS make users more aware of the value of the unique forward looking information of BTS and COS (i.e. their recording of expectations about the future).
As 2017 begins, we are seeing big data and data science communities engage with new tools that specifically cater to data scientists and data engineers who aren’t necessarily experts in these techniques. Given rapid technological advances, the question for companies now is how to integrate new data science capabilities into their operations and strategies—and position themselves in a world where analytics can upend entire industries. Leading companies are using their data science capabilities not only to improve their core operations but also to launch entirely new business models.
Tijdens de vierde sessie van de vierdelige reeks Master Minds on Data Science hield Eric van Tol een presentatie over businesscases en verdienmodellen.
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemDATAVERSITY
No (successful) business is an island. For decades, business schools have taught strategies for improving competitiveness by evaluating strengths, weaknesses, opportunities and threats (SWOT), and considering market forces represented by competitors, consumers, and suppliers. Today, enterprises of all sizes are expected to manage their transactions and customer engagement “touch points” using applications that capture and measure everything from materials to customer satisfaction. As we automate and monitor every aspect of manufacturing and distribution (including the production and delivery of intellectual property for service-oriented businesses) there is a significant and growing role for smart data and sensor/IOT data.
Participants in this webinar will learn to define, capture, and analyze new IOT-based data to improve supply-chain performance.
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-MakingCognizant
It’s a new era in business, in which growth will be driven by finding meaning and insights in data. Recent research demonstrates what separates winners from losers and how to rise to the top as a "meaning maker."
A presentation delivered by Joel Gurin at "The Economic Impact of Open Data" hosted by the Center for Data Innovation in Washington, DC.
More info and video: http://www.datainnovation.org/2014/04/the-economic-impact-of-open-data
Alternative Data is everywhere. We MUST start using them as a competitive edge over the competitors who are all looking to only their traditional data sources
Social media is impacting all parts of organizations – and market intelligence is no exception, with new ways to listen, mine data from new sources, create “always on” communities, and understand behavior and visualize trends. “Social” technologies are changing the way people learn, make decisions and judge brands. Market intelligence professionals can leverage these new realities or risk irrelevance. This presentation covers:
• What world-class companies are learning through social technologies
• How to create a “listen-engage-measure-share” research model
• How social media can increase the value of market intelligence functions (and MI career paths) within organizations
• New best practices for using social technology to enable “wisdom of the crowd” internally and externally
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataMartin Kaltenböck
Vortrag im Rahmen des Data Pioneers Workshop am 10.10.2016 am BMVIT zum Thema Open Innovation und Open Data (Open Innovation mittels Open Data) seitens Elmar Kiesling (TU Wien) und Martin Kaltenböck (SWC) für den ODI (Open Data Institute) Node Vienna.
Is Dirty Data Clogging your Marketing Engine? Do what high performance companies do; implement a data management program with InsideView. The sooner you do, the lower the cost.
MeasureMatch: The Transformational On-Demand Future of Tech & Data TalentMeasureMatch
Purposefully hybrid, fixed + variable talent workforces are, by default, more diverse in character and contribution. Ultimately, this model is the future of exceptional enterprise health and productivity.
McGraw-Hill Professional Business Insider Work Smarter Webinar Series presents Leading with Data: Boost Your ROI with Open and Big Data.
Joel Gurin and Prasanna Tambe discuss 2 hot new topics - open data and big data! You will learn how you can use them to gain the competitive edge in creating and developing a business and building an effective workforce.
For the webinar recording visit: http://bit.ly/mhpworksmarter
About the ODI slides + notes for potential investors theODI
v2015-09-17
An overview of the ODI's vision, team, progress and ambition in slide and notes format, for use by any potential grant or project investors, or by those interested in the ODI and its plans.
This document list the reasons why our past alumni chose NYC Data Science Academy over other programs.
Machine Learning Bootcamp is our flagship program and well received by our community.
This project was completed by Scott Dobbins and Rachel Kogan, who enrolled in the NYC Data Science Academy's 12-Week Data Science Bootcamp. Learn more about the program: http://nycdatascience.com/data-science-bootcamp/
Given that both Wikipedia and comments sections of most websites are freely open to anyone to edit at any time, how has Wikipedia managed to remain such a useful resource while most comments sections are ridden with vandalism, ads, and other counterproductive user behavior?
We believe the answer is two-fold: 1) Wikipedia has an army of bots that quickly identify and revert vandalism so that the worst edits are usually never seen by people and the site generally maintains itself in a well-kempt state, and 2) Wikipedia has a strong community of administrators and other contributors who routinely clean the site’s flagged contents.
Vandalism is relatively easy to flag, though a few clever edits manage to stay on the site for a long time. What about site content problems that are more subjective, like bias? Wikipedia users do routinely manually flag pages with point-of-view (POV) issues, though with millions of pages and no machine-based approaches, the site can only manage to confidently maintain neutrality on the more well-trafficked pages.
Here we propose a solution to solve some of the more intractable content issues for Wikipedia and other sites using Natural Language Processing (NLP) and machine learning approaches. The sheer quantity of data managed by Wikipedia and similar sites requires distributed computing approaches, so we show here how Apache Spark can upgrade common algorithms to run on massive data sets.
A Hybrid Recommender with Yelp Challenge Data Vivian S. Zhang
Developed by Chao Shi, Sam O'Mullane, Sean Kickham, Reza Rad and Andrew Rubino
Watch the project presentation: https://youtu.be/gkKGnnBenyk
This project was completed by students from NYC Data Science Academy's 12-Week Bootcamp. Learn more about the bootcamp: http://nycdatascience.com/data-science-bootcamp/
People make decisions on where to eat based on friends’ recommendations. Since they know you, their suggestions matter more than those of strangers.
For the capstone project, we built a hybrid Yelp recommendation system that can provide individualized recommendations based on your friend’s reviews on the social network. We built the machine learning models using Spark, and set up a Flask-Kafka-RDS-Databricks pipeline that allows a continuous stream of user requests.
During the presentation, we will talk about the development framework and technical implementation of the pipeline.
Read on their project posts and code:
https://blog.nycdatascience.com/student-works/capstone/yelp-recommender-part-1/
https://blog.nycdatascience.com/student-works/yelp-recommender-part-2/
Kaggle Top1% Solution: Predicting Housing Prices in Moscow Vivian S. Zhang
This project was completed by students graduated from NYC Data Science Academy 12-week Data Science Bootcamp. Learn more about the bootcamp: http://nycdatascience.com/data-science-bootcamp/
Watch the project presentation: https://youtu.be/W530d2ZdbJE
Ranked #15 out of 3,274 teams on Kaggle Team Members - Brandy Freitas, Chase Edge and Grant Webb
Given 4 years of housing price data in a foreign market, predicting the following year’s prices should be pretty straightforward, right? But what if in that last year of data, the country’s stock market, the value of its currency and the price of its number 1 export, all dropped by nearly 50%. And on top of all that, the country was slapped with economic sanctions by the EU and the US. This was Moscow in 2014 and as you can see, it was anything but straightforward.
We were able to overcome these challenges and in the two weeks of working together, were able to achieve a top 1% ranking on Kaggle. Our success is a product of our in depth data cleaning, feature engineering and our approach to modeling. With a focus on interpretability and simplicity, we begin modeling using linear regression and decision trees which gave us a better understanding of the data. We then utilized more complicated models such as random forests and XGBoost which ultimately resulted in our top submission.
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Data Science is concerned with the analysis of large amounts of data. When the volume of data is really large, it requires the use of cooperating, distributed machines. The most popular method of doing this is Hadoop, a collection of programs to perform computations on connected machines in a cluster. Hadoop began life as an open-source implementation of MapReduce, an idea first developed and implemented by Google for its own clusters. Though Hadoop's MapReduce is Java-based, and quite complex, this talk focuses on the "streaming" facility, which allows Python programmers to use MapReduce in a clean and simple way. We will present the core ideas of MapReduce and show you how to implement a MapReduce computation using Python streaming. The presentation will also include an overview of the various components of the Hadoop "ecosystem."
NYC Data Science Academy is excited to welcome Sam Kamin who will be presenting an Introduction to Hadoop for Python Programmers a well as a discussion of MapReduce with Streaming Python.
Sam Kamin was a professor in the University of Illinois Computer Science Department. His research was in programming languages, high-performance computing, and educational technology. He taught a wide variety of courses, and served as the Director of Undergraduate Programs. He retired as Emeritus Associate Professor, and worked at Google until taking his current position as VP of Data Engineering in NYC Data Science Academy.
--------------------------------------
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Nyc open-data-2015-andvanced-sklearn-expandedVivian S. Zhang
Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners.
This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning.
Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.
---------------------------------------------------------
Andreas is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously he worked as a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and maintained it for several years.
Material will be posted here:
https://github.com/amueller/pydata-nyc-advanced-sklearn
Blog:
peekaboo-vision.blogspot.com
Twitter:
https://twitter.com/t3kcit
Twitter: @NycDataSci
Learn with our NYC Data Science Program (weekend courses for working professionals and 12 week full time for whom are advancing their career into Data Science)
Our next 12-Week Data Science Bootcamp starts in Jun. (Deadline to apply is May 1st, all decisions will be made by May 15th.)
====================================
Max Kuhn, Director is Nonclinical Statistics of Pfizer and also the author of Applied Predictive Modeling.
He will join us and share his experience with Data Mining with R.
Max is a nonclinical statistician who has been applying predictive models in the diagnostic and pharmaceutical industries for over 15 years. He is the author and maintainer for a number of predictive modeling packages, including: caret, C50, Cubist and AppliedPredictiveModeling. He blogs about the practice of modeling on his website at ttp://appliedpredictivemodeling.com/blog
---------------------------------------------------------
His Feb 18th course can be RSVP at NYC Data Science Academy.
Syllabus
Predictive Modeling using R
Description
This class will get attendees up to speed in predictive modeling using the R programming language. The goal of the course is to understand the general predictive modeling process and how it can be implemented in R. A selection of important models (e.g. tree-based models, support vector machines) will be described in an intuitive manner to illustrate the process of training and evaluating models.
Prerequisites:
Attendees should have a working knowledge of basic R data structures (e.g. data frames, factors etc) and language fundamentals such as functions and subsetting data. Understanding of the content contained in Appendix B sections B1 though B8 of Applied Predictive Modeling (free PDF from publisher [1]) should suffice.
Outline:
- An introduction to predictive modeling
- R and predictive modeling: the good and bad
- Illustrative example
- Measuring performance
- Data splitting and resampling
- Data pre-processing
- Classification trees
- Boosted trees
- Support vector machines
If time allows, the following topics will also be covered
- Parallel processing
- Comparing models
- Feature selection
- Common pitfalls
Materials:
Attendees will be provided with a copy of Applied Predictive Modeling[2] as well as course notes, code and raw data. Participants will be able to reproduce the examples described in the workshop.
Attendees should have a computer with a relatively recent version of R installed.
About the Instructor:
More about Max's work:
[1] http://rd.springer.com/content/pdf/bbm%3A978-1-4614-6849-3%2F1.pdf
[2] http://appliedpredictivemodeling.com
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
<featured> Meetup event hosted by NYC Open Data Meetup, NYC Data Science Academy. Speaker: Owen Zhang, Event Info: http://www.meetup.com/NYC-Open-Data/events/219370251/
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
This talk was presented to NYC Open Data Meetup Group on Nov 11, 2014.
Speaker:
Daeil Kim is currently a data scientist at the Times and is finishing up his Ph.D at Brown University on work related to developing scalable inference algorithms for Bayesian Nonparametric models. His work at the Times spans a variety of problems related to the company's business interests, audience development, as well as developing tools to aid journalism.
Topic:
This talk will focus mostly on how machine learning can help problems that prop up in journalism. We'll begin first by talking about using popular supervised learning algorithms such as regularized Logistic Regression to help assist a journalist's work in uncovering insights into a story regarding the recall of Takata airbags in cars. Afterwards, we'll think about using topic modeling to deal with large document dumps generated from FOIA (Freedom of Information Act) requests and Refinery, a simple web based tool to ease the implementation of such tasks. Finally, if there is time, we will go over how topic models have been extended to assist in the problem of designing an efficient recommendation engine for text-based content.
2. +
Outlines
Open Data Now
Open Data 500
List of Cities Which Have Open Data Portals
Companies: including
10 companies in NY
6 companies in the United States excluding NY
5 companies in the U.K.
1 company in Shanghai, China
1 company in Taiwan, China
3. +
Open data Now
“Today’s Open Data
revolution is rapidly leading
us into new territory.”
“Open Data is becoming a
secret to success for smart
business leaders around
the world.”
4. +
Open data Now
“Open Data can best be described as accessible
public data that people, companies, and
organizations can use to launch new ventures,
analyze patterns and trends, and solve complex
problems.”
In terms of the similarity and difference between Big
Data and Open Data, it is obviously that the
introduction about Big Data could not represent the
scenario of Open Data.
5. +
Open data Now
“This book describes the business applications of
Open Data with examples from dozens of
companies.”
This book reflects the vision, insights, knowledge,
and advice of the leaders in this field who have been
interviewed by the author.
This book also “contains several resources to help
readers explore the possibilities of Open Data”.
6. +
Open Data 500
Open Data 500 is an initiative to research the U.S. companies
and organizations that make use of Open Data published by
government in an innovative way to develop new businesses.
The research is funded by Knight Foundation, a foundation
aiming at promote journalism & media innovation, advance
community engagement, and forster the arts.
Governance Lab ("GovLab"), located at New York University, is
responsible for conducting the research. GovLab is a platform
that brings togother innovators from different backgrounds to
collabratively seek new-technology based solutions for better
governance.
Open Data 500.com is the website that publishes research
outcomes and collects new information about such companies.
7. +
Open Data 500
Three established goals of the Open Data 500 research
a. Provide a basis for assessing the economic value of
government open data
b. Encourage the development of new open data
companies
c. Foster a dialogue between government and business
on how government data can be made more useful
8. +
Open Data 500
Blueprint
a. Domestically, initiate a roundtable series to invovle both
government, Open Data providers, and businesses and
organizations, Open Data users, to communicate on
potential improvement of Open Data.
b. Internationally, cooperate with international
organizations and governments from other countries to
copy U.S. paradiam to worldwide.
9. +
Open Data 500
Strategies of ongoing incoporation of other companies
that use Open Data
a. Outreach campaign: through mass emails, professional
meetings, various social media
b. Expert advice: through industry practioners and lab
advisors
c. Research: through other sources of identification, like
Online Open Data Userbase
10. +
Open Data 500
OpenData500.com has filtering functionality that can filter
recorded companies based on industry, location of state, data
sources of agency.
11. +
What will we cover
Company’s background
Data sources
What do they do with the data
How do they serve the clients with the data
Profit model
Awards/current condition
12. +
Companies to be introduced
NY: US UK China
Enigma Archimedes Inc. Mastodon C Big Data Bureau
ZocDoc SigFig Locatable (Shanghai)
Honest Buildings PsychSignal Open Corporates Open Data Alliance
Capial Cube CARFAX DueDil (Taiwan)
Bloomberg Zillow CarbonCulture
Aidin Brightscope
Calcbench
Consumer Reports
GetRaised
Palantir
13. +
Enigma.io
company’s background
The connotation of the company’s name:
“Both in honor of the code-breaking machine developed
by computer pioneer Alan Turing during World War II and
because they were finding that too much public data was
more enigmatic than it should be.”
Team members and size:
Now a little over a dozen people, who are all based in
New York.
14. +
Enigma.io
data sources
Their datasets included both government data and
data from clients, such as Nike’s public list of its
suppliers.
Their data deluge had grown to 100,000 datasets
and more than 20 billion individual data points.
15. +
Enigma.io
what do they do with the data
Take valuable public data and make it much more
usable.
Make it possible to search through the entire
dataset at a rapid rate.
“Develop a robust data resource and an impressive
set of tools”.
16. +
Enigma.io
how do they serve the clients with the
data
Help clients find the publicly-accessible information
that is relevant to their corporations or economic
accessibility.
Take data in all kinds of formats and put them into
easily usable form of entire datasets.
Turn public sector information into Open Data.
17. +
Enigma.io
profit model
For now, Enigma’s profit mainly come from charges
for accessing to data, though they charge more for
hedge funds and less for academics, nonprofits, or
government agencies.
Enigma wants to, eventually, make their data
accessible to public, and gain profit from analytic
and other premium service.
19. +
ZocDoc
company’s background
Founded in 2007 with a mission of improving access
to healthcare, ZocDoc is a free service that allows
patients to find a nearby doctor or dentist.
Cyrus Massoumi, CEO of ZocDoc said:
"After I ruptured my eardrum on a flight, I couldn't find a
doctor for four days. I knew that there had to be an easier
way for patients to find doctors. That was when I had the
idea for ZocDoc."
20. +
ZocDoc
data sources
They collect information for patients to share with
health providers.
In the mean time obtain information from doctors
such as location, specialty, insurance preferences,
and patient reviews to assist patients with decision
making.
21. +
ZocDoc
what do they do with the data
Encrypt data to the same standards that banks use
to safeguard your financial information.
Constantly analyze data to better understand who
uses ZocDoc and how they can improve it.
22. +
ZocDoc
how do they serve the clients with the
data
Help find nearby doctors and dentists who accept
their insurance, see their real-time availability, and
instantly book an appointment via ZocDoc.com or
ZocDoc’s free apps for iPhone or Android.
Guarantee the patients to get access to cure within
24 – 72 hours.
23. +
ZocDoc
profit model
The healthcare providers who partner with ZocDoc
pay a subscription fee for ZocDoc's service. Since
they help increase the efficiency of their practices.
24. +
ZocDoc
awards
Time Magazine 50 Best Websites 2012
Business Insider App 100: World's Greatest Apps 2013
Fortune Magazine Best Small & Medium Companies
2012, 2013
Crain’s Best Places to Work in NYC 2010, 2011, 2012,
2013
Modern Healthcare Best Places to Work 2011, 2012, 2013
Arizona Business Magazine Most Admired Companies
2013
25. +
Honest Buildings
company’s background
Founded in 2011, Honest Buildings now has around
30 employees.
Honest Buildings is a commercial real estate
marketplace that connects top building professionals
with building owners, decision makers and project
managers.
It is a social media application for real estate.
26. +
Honest Buildings
data sources
Honest Buildings collects information that is posted
by building professionals, owners, tenants, and other
stakeholders.
27. +
Honest Buildings
what do they do with the data
“Big data is great, but you need to do something with
it. Platforms that contextualize data are in great
demand.” - HonestBuildings.com
Honest Buildings collects and posts all sorts of
relevant data that is hard for users to find, like
energy costs, walkability.
28. +
Honest Buildings
how do they serve the clients with the
data
Launches matching service that offers service
providers step-by-step guidance to win a matched
project
a. Register your company
b. Build a portfolio
c. Get founded, add tags & connections
d. Win new business.
29. +
Honest Buildings
profit model
Honest Buildings’ provides matching services to
developers for free.
It charges candidates, i.e. builders and contractors
for a fee to match for developers.
31. +
Capital Cube
company’s background
AnalytixInsight was founded in 2010, with about 10
employees.
Being the online portal of AnalytixInsight,
CapitalCube.com is a global investor portal for
comprehensive company analysis
Operate on-demand fundamental research, portfolio
evaluation, and screening tools on over 40,000
global equities.
32. +
Capital Cube
data sources
Securities and Exchange Commission (SEC)
Third-party data providers
a. FactSet
b. Thomson Reuters
c. Capital IQ
33. +
Capital Cube
what do they do with the data
Develop a software that
can capture 40,000
internationally-operated
public companies’ data on
everyday basis.
Transform the data into
word reports and graphics
that investors can use to
compare companies and
make investment decision.
34. +
Capital Cube
how do they serve the clients with the
data
Investment tools
Analytical service
a. Dividend analysis
b. Earnings quality analysis
Strength
a. Large coverage of 40,000 companies
b. Timeliness on daily basis
35. +
Capital Cube
profit model
License out its produced content and ability to
dealers. Two big deals of such are with:
a. Samsung: every time Samsung users download
CapitalCube’s quote, AnalytixInsight gets revenue
share.
b. Dow Jones: AnalytixInsight gets paid for every page
view of its content that is posted on Dow Jones’s
published pages.
36. +
Capital Cube
awards
Oct. 28, 2013:
recognition as a
company using federal
open data in innovative
and exciting ways, the
White House Office of
Science and Technology
Policy (OSTP) and the
Science and Technology
Policy Institute (STPI).
37. +
Bloomberg
company’s background
Founded in 1982.
The global business and financial information and
news leader, gives influential decision makers a
critical edge by connecting them to a dynamic
network of information, people and ideas.
Strength – delivering data, news and analytics
through innovative technology, quickly and
accurately.
39. +
Bloomberg
what do they do with the data
ESG (Environmental, Social, and Governmental)
data on the Bloomberg Terminal is fully integrated
into all of Bloomberg Terminals’ analytics.
40. +
Bloomberg
how do they serve the clients with the
data
Large proportion of public
companies are now
reporting their
sustainability data, a set
of metrics innovated by
Global Reporting
Initiative, to public. This is
data that investors are
increasingly caring about.
41. +
Bloomberg
profit model
As a data-driven media company, Bloomberg has
journalists all around to congregate financial and
business real-time information.
The company operates over 300,000 terminals
which receive financial data released by the
company.
This business generates 6.3 billion US dollars as
estimated in 2008.
43. +
Aidin
company’s background
Aidin was founded in 2011, and has a scale of less
than 10 employees.
The foundation of the company was inspired by a
troublesome experience of the Aidin’s founder’s family
about the trivial matters they had to deal with when a
family member was processing the discharge from
hospital.
It is missioned to bring transparency to healthcare
facilities and provide patients with information that
they need when making healthcare-related decisions.
44. +
Aidin
data sources
Government Open Data.
Patients’ review about hospitals.
“First with weather and GPS data and now with
health data, the U.S. government has defined its
responsibility as defining, gathering, and
presenting data on important subjects in easily
usable forms.”
46. +
Aidin
how do they serve the clients with the
data
Aidin provides post-acute
healthcare facilities with
data to improve their
services.
Patients can also choose
facilities based on Aidin’s
information.
47. +
Aidin
profit model
Aidin profit from their products, including the full
Aidin solution, Aidin Lite, and Aidin Provider
49. +
Calcbench
company’s background
Founded in 2011, Calcbench is the first company of
its kind to fully harness the power of the new
government mandated data standard XBRL
(Extensible Business Reporting Language), yielding
an unprecedented direct line into the SEC’s
corporate financial data repository.
Calcbench currently has less than 10 employees.
50. +
Calcbench
data sources
XBRL is a freely available and global standard for
exchanging business information
XBRL reports
51. +
Calcbench
what do they do with the data
After Calcbench’s
processing, it turns
public-accessible but
hard-to-use XBRL
(Extensible Business
Reporting Language)
data into more
detailed and insightful
data, thus adding up
the value to the data.
52. +
Calcbench
how do they serve the clients with the
data
Calcbench processes the raw data mandatorily-
collected by government through XBRL and turns it
into usable information to the financial industry.
Calcbench also uses technologies to help SEC
elevate data quality like in terms of error
identification and correction.
53. +
Calcbench
profit model
Calcbench serves corporate reporting professionals,
corporate finance leaders, auditors, investment
researchers, and academies.
It charges fee for using its service and provides
premium suite to generate revenue.
54. +
Calcbench
awards
Grand Prize Winner, XBRL Challenge at the XBRL
and Financial Analysis Technology Conference,
February 29, 2012.
55. +
Consumer Reports
company’s background
Founded in 1936, Consumer Reports is an expert,
independent, nonprofit organization whose mission
is to work for a fair, just, and safe marketplace for all
consumers and to empower consumers to protect
themselves.
It has over 500 employees.
57. +
Consumer Reports
what do they do with the data
Consumer Reports produces safety ratings for more
over two thousand hospitals, based on Open Data
sources.
58. +
Consumer Reports
how do they serve the clients with the
data
Improve American people’s lives by using public-
accessible data.
Consumer Reports’ safety ratings can assist
customers with decision making on which hospital to
go.
59. +
Consumer Reports
profit model
Consumer Reports is a non-profit organization. It
does not accept advertisements nor have
shareholders, but depends solely on subscribers’
fee.
60. +
Consumer Reports
awards
Sigma Delta Chi Award for Public Service, Society of
Professional Journalists, 2008
Honorable Mention, National Press Club for Caution:
The Secret Score Behind Your Auto Insurance, 2007
People’s Voice Award, International Academy of
Digital Arts and Sciences, 2006
Golden Triangle Award, American Academy of
Dermatology, 2005
61. +
GetRaised
company’s background
GetRaised was founded in 2010, with less than 10
employees now.
GetRaised aspires to be: a bunch of complicated
data that is hidden behind a very simple, easy-to-use
interface that can help narrow the wage gap and
help people get paid more.
63. +
GetRaised
what do they do with the data
GetRaised uses the collected data to develop a
salary engine.
It has created raise request based on the analysis of
experts from related fields, like HR and research
institutes.
64. +
GetRaised
how do they serve the clients with the
data
GetRaised provides customers with information on
how much they should be paid, was underpaid, and
should ask for an increase in salary.
A significant proportion of women who request for
salary raise based on the information of GetRaised
got their raise eventually.
65. +
GetRaised
profit model
As a non-profit, GetRaised provides free service to
users.
It receives support from two organizations and four
individuals.
66. +
GetRaised
awards
According to Dave Clarke, a communications
strategist, “81% of women that have used GetRaised
have, in fact, earned a raise. The average raise
amount across all users (male and female) is
$6,473”.
67. +
Palantir
company’s background
Palantir was founded in 2004, with over 500
employees today
It develops software which professionals from
different industries and sectors can perform analysis
on massive disparate data
Its product and service helps combat terrorism,
prosecuting crimes, fighting fraud, and eliminate
waste
69. +
Palantir
what do they do with the data
Various types of data is entered by users into
Palantir’s software, and it will generate reports and
analysis that users can understand directly. It’s not
only artificial intelligence, but something called
intelligence augmentation.
70. +
Palantir
how do they serve the clients with the
data
Help government analyze its open data in order to
reflect government expenditure and possible flaws.
Combat people trafficking by analyzing data.
72. +
Palantir
awards
VAST2009 Visual Analytics Award, 2009
Hall of Innovation Award, J. P. Morgan Chase
Technology Innovation Symposium, October 2010
73. +
Archimedes Inc.
company’s background
Began as part of Kaiser Permanente in 1993 and
split of as a second company in 2006.
It has decades of experience in developing
algorithms and predictive models for healthcare.
74. +
Archimedes Inc.
data sources
Centers for Medicare & Medicaid Services (CMS).
Databases of clinical trials.
National Health and Nutrition Examination Survey.
75. +
Archimedes Inc.
what do they do with the data
They use the Archimedes Model as its core tool to
analyze data.
Archimedes scientists have analyzed many various
relevant data to derive hundreds of equations that
represent the effects of multiple diseases, tests,
and treatments.
The equations are integrated into a single, large-
scale simulation model using object-oriented
programming.
76. +
Archimedes Inc.
how do they serve the clients with the
data
They provide a suite of online healthcare simulation
and analytics tools (ARCHeS) that provide answer
to questions about the health outcomes and
economic effects of different interventions.
They also have a product (IndiGo) that generates
individual guidelines that identify and help prioritize
the best preventive care for each patient.
77. +
Archimedes Inc.
profit model
They make a profit by selling their products:
ARCHeS (Archimedes Healthcare Simulator) and
IndiGo (Individualized Guidelines and Outcomes).
They also provide modeling and consulting services,
including modeling of disease and new intervention,
and analyzing different prevention strategies.
78. +
Archimedes Inc.
awards
Jun 6, 2012: IndiGO Receives Best of Care
Applications Award at the 2012 Health Data
Initiative III.
80. +
SigFig
data sources
Historical fundamentals & Price data
Data on load fees comes from Lipper
Portions of their advice rely on third-party market
data from companies like Lipper, Thomson-
Reuters, Interactive Data, and Xignite.
82. +
SigFig
how do they serve the clients with the
data
A single Portfolio Dashboard
83. +
SigFig
profit model
“The company doesn’t charge a management fee.
Instead, it earns its revenue through publishing
arrangements with several websites and referral
fees when users go to a new broker.”
87. +
PsychSignal
what do they do with the data
Scour the online conversation looking for distinct
psychological expressions of emotion or attitude.
Aggregate millions of expressions to arrive at a
picture of crowd mood in real time.
Built an advanced proprietary sentiment engine.
Use NASA developed signal processing.
89. +
PsychSignal
profit model
NASA recently open sourced an algorithm our
engineers found particularly useful.
Day to day sentiment is extremely noisy, making it
hard to determine subtle changes in trending
sentiment data.
These NASA algorithms help uncover the trend
beneath the noise.
91. +
CARFAX
company’s background
Provide vehicle history information, used by
millions of consumers each year.
Receive millions of visitors each month.
With thousands of auto dealers nationwide
subscribers.
92. +
CARFAX
data sources
U.S. motor vehicle agencies
Canadian provincial motor vehicle agencies
Auto auctions
Collision repair facilities
Service/ maintenance facilities
Insurance companies
Salvage auctions
Automotive recyclers
Rental/fleet vehicle companies
State inspection stations
93. + CARFAX
what do they do with the data
Format the data into reports containing the following
information:
a. Title information, including salvaged or junked titles
b. Flood damage history
c. Total loss accident history
d. Odometer readings
e. Lemon history
f. Number of owners
g. Accident indicators, such as airbag deployments
h. State emissions inspection results
i. Service records
j. Vehicle use (taxi, rental, lease, etc.)
94. +
CARFAX
how do they serve the clients with the
data
Provide reports at a price.
Provide information regarding used cars to sell and
car deals.
Also provide car facts research on various makes
and models of different cars.
95. +
CARFAX
profit model
Make a profit mainly by
selling their reports.
Also offer CARFAX Hot
Listings™ and Safety &
Reliability Ratings™ products.
96. +
CARFAX
awards
Has the most comprehensive vehicle history
database available in North America
One of the top five websites that consumers turn to
for vehicle information
97. +
Zillow
company’s background
Founded in 2005, the company is headquartered in
Seattle with offices in New York, San Francisco,
Chicago, Irvine, Calif. and Lincoln, Neb.
It provide consumers with information and tools to
make smart decisions about homes, real estate,
and mortgages
“At Zillow, we built our business taking public real
estate information that was previously only
accessible by spending hours in dusty registry of
deeds making it easily accessible to consumers,
for free.”
98. +
Zillow
data sources
Freely available data from:
a. The Bureau of Labor Statistics.
b. The Federal Housing Finance Agency.
c. The Census Bureau.
99. +
Zillow
what do they do with the data
Collected open government data and build a living
database of more than 110 million U.S. homes.
Republished the data on their website to provide
users with various information regarding homes,
real estate, and mortgages.
100. +
Zillow
how do they serve the clients with the
data
Provide search for houses for sale, for rent, and
pre-market, as well as information for buyers and
lenders.
Provide an open, anonymous, and free marketplace
for borrowers and lenders.
Zestimate, Zillow's estimated market value for an
individual home, a starting point in determining a
home's value.
101. +
Zillow
profit model
Zillow operates mainly by advertising.
It operates the largest real estate and rental
advertising networks in the U.S. in partnership with
Yahoo! Homes.
“The company was founded in 2005, had over $66
million in revenue when they launched an IPO in
2011, and had a valuation of $2.3 billion in 2013.”
102. +
Zillow
awards
In 2013
November 17: Zillow CEO Spencer Rascoff named the EY National
National Entrepreneur of the Year™ 2013 Services Award.
April 30: Zillow sweeps the real estate category in the 17th Annual Webby
Awards, winning the People's Voice Award and the overall Webby Award in
the Real Estate category.
October 10: Zillow wins the Mortgage Technology Award for its Zillow
Mortgage Marketplace iPhone App.
June 13: Zillow honored at the Webby Awards as the People's Voice
Winner and the overall Webby Award winner for the Real Estate category.
May 4: Zillow wins Webby Award as Best Real Estate site.
103. +
Brightscope
company’s background
BrightScope®, Inc. is a financial information
company use data to drive better decision-making
for individual investors, corporate plan sponsors,
asset managers, etc.
It primarily operates in two major segments:
Retirement Plans and Wealth Management.
104. +
Brightscope
data sources
Data on Form 5500 from Department of Labor.
The founders, Mike and Ryan Alfred, actually
persuaded the Department of Labor to begin
collecting and publishing the Form 5500 data on
the Internet.
105. +
Brightscope
what do they do with the data
The gathered and republished data to make it more
clear and informative, and more available to the
public.
106. +
Brightscope
how do they serve the clients with the
data
Provide ratings of 401k and 403b plans across
critical metrics.
Launched the first comprehensive and publicly
available directory of Financial Advisors.
Provides free white papers for the public that
examine trends in the Defined Contribution (DC)
market.
107. +
Brightscope
profit model
Make a profit by selling their market intelligence
product called Beacon and their sales intelligence
product for retirement plans called Spyglass.
Sell research papers with detailed investment and
provide data on 50,000 Defined Contribution (DC)
plans.
108. +
Brightscope
awards
The first to convince the Department of Labor to
begin collecting and publishing the Form 5500 data
and utilized that data to make a profit.
The first comprehensive and publicly available
directory of Financial Advisors.
109. +
Mastodon C
company’s background
Mastodon C is a Big Data analytics company.
It was the first of its kind to join the ODI (Open
Data Institute) incubator.
CEO Fran Bennett spent years working for search
engines, help them to turn data into money.
110. +
Mastodon C
data sources
Their main data source is the big sets of Open
Data from the U.K.’s (NHS) National Health
Services.
111. +
Mastodon C
what do they do with the data
Use cloud computing to analyze data.
Use Hadoop and Cassandra technologies to
integrate real time sensor data, web service data
and spreadsheets from their clients.
One of their main signatures is analyzing data on
zero carbon infrastructure.
112. +
Mastodon C
how do they serve the clients with the
data
Turn messy data, either open data or clients’
proprietary data, into useful insights.
Serve data into a format that makes sense.
113. +
Mastodon C
profit model
“Mastodon C has just done a government-funded
analytics of variations in prescribing patterns
across the United Kingdom, finding areas where
expensive drugs are being prescribed for no
apparent reason when generics would work as
well.”
115. +
Locatable
company’s background
Another company in the ODI incubator.
At first they wanted to help people find where to
live and built a website to provide data for such
decisions, now they changed their website to assist
people in managing their home.
Founders Vasanth Subramanian and David Prime
are both former physics students.
117. +
Locatable
what do they do with the data
Collect from all sources and integrate them into
one dashboard.
118. +
Locatable
how do they serve the clients with the
data
Provide their clients with a single dashboard to
manage their home and optimize their
performances.
Provide visibility across all the most expensive cost
including mortgage, insurance, utility bills and
maintenance costs.
Help their clients find cheaper deals for all these
services.
119. +
Locatable
profit model
A similar business model to a property portal which
generates leads for estate agents.
Steps to search:
a. Customers log on and enter the locations that they
want to live near.
b. The results show customers the places which best
fit the bill and the properties available.
c. The website refers customers to sites and charges
an affiliate fee.
120. +
Locatable
current condition
As ODI analyzes, their site “currently caters for the
London area but the team are working on rolling it
out across the UK”.
For example, “in 2 months, Locatable has attracted
more than 4,000 unique visitors, with users offering
some great feedback.”
121. +
OpenCorporates
company’s background
The company behind OpenCorporates is Chrinon
Ltd, and the people who founded it are Chris
Taggart and Rob McKinnon.
Rob built TheyWorkForYou.nz and
WhosLobbying.com.
Chris built OpenlyLocal.com and OpenCharities,
and sits on the UK Government's Local Public Data
Panel, the UK's Tax Transparency Board, and Open
Knowledge Foundation's open government working
group.
122. +
OpenCorporates
data sources
They source the information in their databases
from government and other sources through a
variety of means including:
a. directly from government websites and APIs,
b. from publicly available datasets,
c. or through Freedom of Information requests.
123. +
OpenCorporates
what do they do with the data
As Chris explains, “we take messy data from
government websites, company registers, official
filings and data released under the Freedom of
Information Act, clean it up and using clever code
make it available to people.”
125. +
OpenCorporates
profit model
While Open Corporates releases all its findings
for free as Open Data under an open license, its
business model includes offering additional paid
services.
127. +
DueDil
company’s background
Launched in 2011, this startup called Duedil -
derived from “due diligence” – “is building an
ambitious database on the other side of the
Atlantic”.
Their goal is to provide requisite information for
lenders to invest in small and medium-size
companies with confidence.
128. +
DueDil
data sources
“Much of Duedil is built on Open Data, like data
from Companies House, the United Kingdom’s
central corporate registry.”
They also use some proprietary data sources.
129. +
DueDil
what do they do with the data
They aggregated more that 20 years of digitized
financial record and made it available on their
website
130. +
DueDil
how do they serve the clients with the
data
They provides data on small-to-medium-sized
companies in the hope of encouraging hundreds of
billions of dollars in new investment.
131. +
DueDil
profit model
They use Open Data to fill the information gap
between small business owners and the investors.
They “add value through its analysis and
functionality rather than by having exclusive rights
to any dataset”.
They also “serve as a platform where SMEs can
provide information about themselves look for
potential business partners, and develop the
groundwork for productive dearls”.
132. +
DueDil
awards
They has been nominated for numerous startup
company awards, including:
a. being shortlisted for two Guardian Digital Innovation
2012 awards,
b. finalist in the Orange Innovation Award in the
National Business Awards 2012,
c. named as one of '31 to Watch' in Outsell’s
Information Industry Outlook 2012: Break and
Reset report.
133. +
CarbonCulture
company’s background
It is “a digital start-up that was launched in 2009”.
“Luke Nicholson, the founder and Director of
CarbonCulture, is a social entrepreneur with a
background in design communications and
sustainable innovation.”
Now, “the team is made up of four full-time
employees and several part-time staff, with a broad
network of associates, partner businesses and
NGOs.”
134. +
CarbonCulture
data sources
Data is collected from an organization’s Building
Management Systems and its Automated Meter
Reading.
Luke Nicholson says: “We can integrate with any
system, as well as with buildings that do not have
automated meter readings. We publish open data
for a number of government departments, local
authorities and universities, and are working now
with corporate customers to do the same for them.”
135. +
CarbonCulture
what do they do with the data
They were “inspired by a huge global challenge –
to accelerate sustainable transformation at a large
scale and to use digital technology and great
design to make it happen.
They use high-tech metering to monitor carbon use
in the workplace.
136. +
CarbonCulture
how do they serve the clients with the
data
It helps clients use that data to make better
decisions around energy usage and sustainability,
enabling them to realise cash savings.
It also places great emphasis on design and user
interface, enabling people within an organisation to
connect, so that employees develop a shared
understanding of sustainability.
137. +
CarbonCulture
profit model
Measure and report on clients’ carbon and energy
performance, publishing it in real time online as
well as in workplace receptions and intranets.
Develop apps for clients that allow people and
buildings to work together to make savings.
138. +
CarbonCulture
awards
It delivered much higher engagement and energy
savings than expected - 40% staff take up and a
10% saving in gas usage - leading to
CarbonCulture being deployed in seven more
government departments.
139. +
Big Data Bureau
Shanghai, China
Shanghai Municipal Commission of Economy and
Informatization is now preparing a Big Data Bureau
to share government data and information.
On April 30, 2014, Shanghai Public Credit &
Information Platform was open to the public at the
first time.
But there are still many challenges for the
government.
141. +
Taiwan’s Open Data Alliance
The UK’s Open Data Institute (ODI) and Taiwan’s
Open Data Alliance (ODA) signed a Letter of Intent
on 11 December 2013
They will collaborate on a range of activities:
Sharing expertise, knowledge and best practice
Carrying out collaborative projects
Designing support and collaboration systems for
open data driven businesses
Developing open data technologies
Editor's Notes
A lemon is a car, often new, that is found to be defective only after it has been bought. Any vehicle with numerous, severe issues can be termed a "lemon," and, by extension, any product with flaws too great or severe to serve its purpose can be described as a "lemon".
Hot listing is
The biggest data sets the Locatable team currently use are public transport related: National Rail, London Underground and Tramlink. Their next step is working on integrating schools data and crime statistics which are all open data sets.