One of our goals is to look for patterns of innovationPattern is something that follows a ruleFor example most of the bicycles are silverOutlier is something that does not follow the ruleIn this case yellow bike could be considered an outlierBoth patterns and outliers are interestingPatterns are indicative of the current trendsOutliers are indicative of something new; that may later on become a trendAt one time both microsoft and intel could have been considered an outliersThere are very many patterns; but only some of them may be of interestFor example, parking officer may be looking for absense of permits to issue ticketsIf I am looking for my bike; i am interested in different featuresSo what makes patterns interesting or useful is partially dictated by the goalsLet me briefly touch up on the meaning of patternsThere are quite a few patterns in this pictureFor example most of bicycles tend have the same orientation---So let me just mention a few:Color is one of the patters that jumps out right awayFor example there is a lot of aluminum colorsYellow bike jumps out as an outlierIf we look closer we may also notice that there is only one bike where the handles are greenOnly a few bikes have their seat covered with plasticBikes are more or less lined upThere is a bike that is facing the wrong way though----------Even in these small dataset there are so many patterns and outliersBut how many of them are interesting; that really depends.We try to find patterns that are novel; since telling people that bicycles tend to have two wheels is perhaps not so interesting.What is interesting also depends on the purpose;A person checking whether bicycles have permit for parking – is looking for a specific outliersWhen I look for my own bike; I have a different outlier in mindSo ability to spot things that are interesting is extremely important.Outliers are normally discarded in data mining …Because you are often trying to find a pattern, and outliers screw up things.In business, some outliers have become very successful as described in the following book.So we thing it is interesting to look not only for patterns but also for outliers
Can’t do data mining without the data; so we need data and the more the better – since then we can see patterns more clearly
Adding more dimensions may allow to identify patterns easierBut more dimmensions also required data
Innovation happens very fast. If you are too slow – you loose.To react fast, we need the current data.
My specialty is AI & Data MiningSo a first thing, is to get the data.There is a lot of nice data on innovation but it is not so recent. In traditional data gathering, data is often gathered over a period of time. Then it goes through various processes within organization, gets analyzed; some reports are released; and then the data is released. This process may take several years.
So we try to get data from different source types.Social Media produces very current data, but may not always be as reliable (biased towards the public consensus)News data tends to be accurate but coverage is often limited (biased by authors views)Data from government organizations, is often of high quality, but takes years to produceWe then federate this data, and iterate between analysis and visualization
In our project we try to understand innovationTo get a more full picture we gathered data on various aspects;Companies, people, …And also how they are interconectedWhat makes this data set different, besides its timeliness is the majority of data is about small companies having between 1 – 5 employees.A lot of innovation happens there so it is important to track; but is usually not captured.
This shows how we have evolved from the local/regional activities
This shows how the models of innovations have evolved reflecting the changes
We can also look at the companies by sector
At the core of this research we have what initially were called “regional technology-based economic development”– however each of the three parts has experienced changes, which calls for updating the whole concept
So far I have shown analysis based on the spatial distance;However the aspects of distance is changing;We don’t know where these people are physically located but they seem to be in the same space.
So the new maps may be based on the connections; rather than on distance.For this analysis we have utilized an open source tool called NodeXL
My name is Neil Rubens, I am not a journalist; I am a data miner – but I think in essense it is not so different.
It is rare that the data is simply brought to us on a silver platterWe have to try hard to actively acquire it
This map indicates the location of the companies. Size of circle indicates number of companies.For this part of analysis we have used Tableau Software.