The results of Text Big Data Analytics Study «Third Wave» is presented. The Study is provided by Dell EMC External Research and Academic Alliances in Russia.
Good day colleagues,
With pleasure I start this presentation with quotation of Nizami Ganjavi who was the 12th century ancient Persian poet and philosopher from time of the Shirvan Shahs and who very influenced to Azerbaijan development.
«As the pen began its first movement,
It produced first word and speech.
When they raised the curtain of non-existence,
The first manifestation was word and speech.
As the pen began to move,
It opened the eyes of the world with its words.
Without speech the world has no voice...».
Study is devoted to Text Big Data Analytics.
This study is named after Alvin Toffler's «Third Wave» concept about information era.
Study is provided by Dell EMC External Research and Academic Alliances in Russia.
Internet is the Global storage of texts and this type of data is still Dark Data which wasn’t
recognized before as data. Application Programming Interface or API-access to data storages is the modern and growing industry. We explored the opportunity to extract new knowledge from this Dark Data. Using API- access to Google and Yandex data Morphological Matrix of several keywords was collected in conjunction with countries names and 2015 year of texts publication. Google and Yandex are considered as non-classical hybrid supercomputers with API access «as-a-Service» to data storages (Сloud model). Countries from post-Soviet region, Eurasia and North Africa were included into study.
Keywords patterns reflect the society as a dynamic flexible system with adaptive spatiotemporal changes. Global Internet’s audience forms «people-to-IT» system through that we can study the three levels of behaviour of Global society
Level 1 – activity on the Internet that is different for different countries and depends on number
of computers in the country;
level of literacy and IT education;
number of IP connections and Internet accessibility;
level of economic development and number of Internet services;
amount of country's population;
processes in politics and society, which are actively discussed;
citing of country.
Level 2 – Internet’s open textual resources reflect the different features of technological evolution
in different countries.
People write about technological and economic processes which have place in their life, business, country’s agenda etc.
Level 3 – reaction to stress, people write more about that problems which concern them more.
Level 1 – countries’ activity on the Internet.
Cluster analysis - k-means algorithm was used for investigation of Level 1.
49 countries are divided into five clusters with High, Middle and Low levels of Internet activity and countries citing.
These clusters allocation should be taken into account. But at the same time more or less activity and citing of countries on the Internet point out the economic development in countries.
The modified Simon Wardley’s Value Chain Map was used for investigation of Level 2. Making Value Chain Map we can look to patterns of keywords in dynamics. All components of the Value Chain Map are evolving from left to right due to demand competition. Value Chain Map allows to put keywords in needed sequence to determine what is already cover previous stage of development, and what is the innovative stage which determines future success during upcoming Sixth Kondratiev’s Wave. The aim of Value Chain Map is to show a moment of advantage, a moment of competition. In presented chain the moment of competition, which will guarantee leadership, is energy power. Electricity generation will determine the acceleration of development of new technology and ecosystem of IoT.
The Kondratiev’s Waves have been described since industrial obtaining of carbon has started.
On this slide the Kondratiev’s Waves sequence is shown. Kondratiev’s Waves are the measure of evolution of Technological Order. Kondratiev’s Waves endure 30-50 years.
Ten keywords of Morphological Matrix reflect the main modern economic trends and indicate different Kondratiev’s Waves.
On this slide you can see the Value Chain Map for post-Soviet countries. The patterns of keywords in million counted during 2015 in conjunction with the countries names are presented. Russia and Georgia have increased level of keywords «Cloud computing», «Electric cars» and «Solar panel». All these technologies are the features of upcoming Sixth Kondratiev’s Wave. Detailed description you can read in the conference proceedings.
Russia, Georgia, Ukraine and Azerbaijan have the growing interest in hydrocarbons. We can notice that these countries will be in the common row of pioneers at the new market of carbon materials, «printing ink» for 3D-printing, that will replace at hydrocarbons market fuel to materials. I can remind the idea of the Russian scientist Dmitriy Mendeleev in report to Russian government in 1886 “Baku oil business” that to burn oil is the same as to stoke stove by banknotes. Mendeleev was the first who proposed to build the factories for petrochemical industry. Now the 3D-printing technology for production of electric cars is emerging. The 3D-printing material for cars is carbon-fiber reinforced plastic that industrially is obtained from oil and gas.
On this slide you can see the patterns of keywords that reflect the countries stress reaction. We can investigate the impact of stress factors on the countries interest in innovative development.
Ukraine, Syria, Iraq, Afghanistan have the higher level of keywords frequency «Migrants», «Refugees».
Among presented countries we see two countries that show the growing interest in new emerging technology and unlimited energy supply – China and Georgia.
Ukraine, Syria, Iraq, Afghanistan have the higher level of keywords frequency «Terrorism», «Terrorist», «Narcotic», «Violation». So combining three diagrams we can notice that countries with indication to innovative development don’t have indication to destructive stress factors that are causing damage to country’s development. And countries with indication to destructive stress factors don’t have indication to innovative development. But we can notice that Georgia has higher level of keywords «Refugees» and China has higher level of keywords «Narcotic». This is the indication to inhibitory factor for each of two countries.
On this slide you can see the percentage ratio of countries interest in four Keywords Phrases
(Terrorist, Refugees, Drip irrigation, Solar panel) during 2015. Azerbaijan has the lowest indication to destructive stress factors among presented countries, and also Azerbaijan has predominant interest in drip irrigation.
Azerbaijan as well as all presented countries will suffer from impending Global warming. Deficit of water for agriculture and drought will cause hunger and instability in the countries. To prevent falling, countries need to develop a broad network of drip irrigation, which requires a lot of electricity. Solar energy is more appropriate for regional environment as renewable energy resource. We can recognize Azerbaijan as country with lowest risk to become fragile country under Global warming.
Comparing the patterns of keywords for Azerbaijan can be recommended to pay more attention to develop Cloud computing. Keywords «Cloud computing» reflect the beginning of upcoming Sixth Kondratiev’s Wave (2020-2070), during this Wave will be completing of Cybernetic revolution. The future ecosystem of IoT will be based on Cloud computing network of devices, vehicles, buildings and other items which are embedded with electronics, software, sensors. The most of these devices due to Cloud computing will be easy created by 3Dprinting from plastic materials.
Keywords «Electric cars» reflect the interest in crucial technology that has already brought down the oil market. In other words, the electric cars are the real turning point in Technological Order. Keywords «Electric cars» also apply to upcoming Sixth Kondratiev’s Wave.
Keywords can open the knowledge about countries activity on the Internet, interest in technological trends and reaction to destructive stress factors. The understanding of different words frequency on the global Internet is the important scientific task that leads to connection of Artificial Intelligence with Internet for the rapid analysis of situation and forecast.
Thank you very much for watching. More detailed description of this study you can read in the conference AICT proceedings.