Please think of several patterns and outliers in bicicles picture.ASK AUDIENCE---So let me just mention a few:Color is one of the patters that jumps out right awayFor example there is a lot of aluminum colorsYellow bike jumps out as an outlierIf we look closer we may also notice that there is only one bike where the handles are greenOnly a few bikes have their seat covered with plasticBikes are more or less lined upThere is a bike that is facing the wrong way though----------Even in these small dataset there are so many patterns and outliersBut how many of them are interesting; that really depends.We try to find patterns that are novel; since telling people that bicycles tend to have two wheels is perhaps not so interesting.What is interesting also depends on the purpose;A person checking whether bicycles have permit for parking – is looking for a specific outliersWhen I look for my own bike; I have a different outlier in mindSo ability to spot things that are interesting is extremely important.Outliers are normally discarded in data mining …Because you are often trying to find a pattern, and outliers screw up things.In business, some outliers have become very successful as described in the following book.So we thing it is interesting to look not only for patterns but also for outliers
Can’t do data mining without the data; so we need data and the more the better – since then we can see patterns more clearly
Also when we have more dimensions it is easier to spot patterns
My name is Neil Rubens, I am not a journalist; I am a data miner – but I think in essense it is not so different.
It is rare that the data is simply brought to us on a silver platterWe have to try hard to actively acquire it
Now let me briefly describe a case of how we utilized the above mentioned principles.In our project we try to understand innovation, so have gathered the data on companies, people and money.What makes this data set different, besides its timeliness is the majority of data (thanks to social media) is about small companies having between 1 – 5 employees.A lot of innovation happens there so it is important to track.
This shows how the models of innovations have evolved reflecting the changes
This shows how we have evolved from the local/regional activities
At the core of this research we have what initially were called “regional technology-based economic development”– however each of the three parts has experienced changes, which calls for updating the whole concept
This map indicates the location of the companies. Size of circle indicates number of companies.For this part of analysis we have used Tableau Software.
We can also look at the companies by sector
We can try to analyze relations between sectors; here are the advertising and web sectorsA lot of things going on in Silicon Vaelly; but also in the North East and other parts
Here is the biotech and cleantech
We can also at specific cities and regionsSV looks very interesting
This is seattle
So as you can see the patters are very different from city to city
So far I have shown analysis based on the spatial distance;However the aspects of distance is changing;We don’t know where these people are physically located but they seem to be in the same space.
So the new maps may be based on the connections; rather than on distance.For this analysis we have utilized an open source tool called NodeXL
-------------------------http://www.bbsservicesinc.com/sitebuildercontent/sitebuilderpictures/world-map.gifPartners: Government agenciesEducational institutionsSME’s Services & consultanciesVenture groupsLarge organizations Data points:PatentsLicensesJobsPublicationsCitationsResource flows – investments, sales, valuations-----------------ChinaJapan – JSTNYC – NYC MediaLabAustin – MCC, SematechMpls/St.P – Finland – TEKKES, FINNODEAbuDhabi