This document summarizes a presentation on potential applications using the class frequency distribution of maximal repeats from tagged sequential data. It discusses using maximal repeat patterns and their frequency distributions over time to analyze trends in topic histories from literature, detect anomalies in manufacturing processes for quality control, and identify distinguishing patterns in genomic sequences. Potential applications discussed include text mining historical archives, individualized learning based on topic histories, detecting changes in language for elderly assessment, monitoring new word adoption, and integrating IoT sensor data with product traceability systems for industrial quality assurance.
Yarn Resource Management Using Machine Learningojavajava
HadoopCon 2016 In Taiwan - How to maximum the utilization of Hadoop computing power is the biggest challenge for Hadoop administer. In this talk I will explain how we use Machine Learning to build the prediction model for the computing power requirements and setting up the MapReduce scheduler parameters dynamically, to fully utilize our Hadoop cluster computing power.
How to plan a hadoop cluster for testing and production environmentAnna Yen
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
This document discusses using Jupyter Notebook for machine learning projects with Spark. It describes running Python, Spark, and pandas code in Jupyter notebooks to work with data from various sources and build machine learning models. Key points include using notebooks for an ML pipeline, running Spark jobs, visualizing data, and building word embedding models with Spark. The document emphasizes how Jupyter notebooks allow integrating various tools for an ML workflow.
This document provides an overview of a business intelligence (BI) system architecture. It includes a product database using Attunity for change data capture fed into a Teradata data warehouse. An ETL system extracts and transforms the data from the warehouse for analysis in Tableau, a BI reporting tool. Centralized logging of the database, applications, and web console are stored in a separate logging database.
Yarn Resource Management Using Machine Learningojavajava
HadoopCon 2016 In Taiwan - How to maximum the utilization of Hadoop computing power is the biggest challenge for Hadoop administer. In this talk I will explain how we use Machine Learning to build the prediction model for the computing power requirements and setting up the MapReduce scheduler parameters dynamically, to fully utilize our Hadoop cluster computing power.
How to plan a hadoop cluster for testing and production environmentAnna Yen
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
This document discusses using Jupyter Notebook for machine learning projects with Spark. It describes running Python, Spark, and pandas code in Jupyter notebooks to work with data from various sources and build machine learning models. Key points include using notebooks for an ML pipeline, running Spark jobs, visualizing data, and building word embedding models with Spark. The document emphasizes how Jupyter notebooks allow integrating various tools for an ML workflow.
This document provides an overview of a business intelligence (BI) system architecture. It includes a product database using Attunity for change data capture fed into a Teradata data warehouse. An ETL system extracts and transforms the data from the warehouse for analysis in Tableau, a BI reporting tool. Centralized logging of the database, applications, and web console are stored in a separate logging database.
This document discusses Hivemall, a machine learning library for Apache Hive and Spark. It was developed by Makoto Yui as a personal research project to make machine learning easier for SQL developers. Hivemall implements various machine learning algorithms like logistic regression, random forests, and factorization machines as user-defined functions (UDFs) for Hive, allowing machine learning tasks to be performed using SQL queries. It aims to simplify machine learning by abstracting it through the SQL interface and enabling parallel and interactive execution on Hadoop.
Achieve big data analytic platform with lambda architecture on cloudScott Miao
This document discusses achieving a big data analytic platform using the Lambda architecture on cloud infrastructure. It begins by explaining why moving to the cloud provides benefits like elastic scaling, reduced operational overhead, and increased focus on innovation. Common cloud services at Trend Micro like an analytic engine and cloud storage are then described. The document introduces the Lambda architecture and proposes a serving layer as a service. Key lessons learned from building big data solutions on AWS include the pros of unlimited scalability and easy disaster recovery compared to on-premises infrastructure.
SparkR - Play Spark Using R (20160909 HadoopCon)wqchen
1. Introduction to SparkR
2. Demo
Starting to use SparkR
DataFrames: dplyr style, SQL style
RDD v.s. DataFrames
SparkR on MLlib: GLM, K-means
3. User Case
Median: approxQuantile()
ID Match: dplyr style, SQL style, SparkR function
SparkR + Shiny
4. The Future of SparkR
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic StackLen Chang
This document proposes implementing a real-time centralized logging system using the Elastic Stack. It introduces Elastic Stack components like Filebeat, Elasticsearch, and Kibana. It then provides a use case of converting log timestamps to a standard sort format using Logstash filters like grok and date. The presenter works at WeMo Scooter, an electric scooter rental startup aiming to reduce emissions. He is interested in technologies like Elastic Stack, PostgreSQL, and Spark.
Logs are one of the most important sources to monitor and reveal some significant events of interest. In this presentation, we introduced an implementation of log streams processing architecture based on Apache Flink. With fluentd, different kinds of emitted logs are collected and sent to Kafka. After having processed by Flink, we try to build a dash board utilizing elasticsearch and kibana for visualization.
1. The document discusses best practices for scientific software development including writing code for people to read, automating repetitive tasks, using version control, and avoiding redundancy.
2. Specific approaches mentioned are planning for mistakes, automated testing, continuous integration, and using style guides to ensure code is readable and consistently formatted.
3. Knitting allows analyzing and reporting in a single file by embedding R code chunks in markdown documents.
1. The document discusses best practices for scientific software development, including writing code for people rather than computers, automating repetitive tasks, using version control, and conducting code reviews.
2. Specific approaches and tools recommended are planning for mistakes, automated testing, continuous integration, and using a coding style guide. R and Ruby style guides are provided as examples.
3. The benefits of following such practices are improving productivity, reducing errors, making code easier to read and maintain, and allowing scientists to focus on scientific questions rather than software issues. Reproducible and sustainable software is the overall goal.
This document summarizes a talk given by Dr. Noel O'Boyle on using Python for chemistry. It discusses what Python is, why it is useful for chemistry, and how it can be used. Specific examples are given of popular Python modules for tasks like data analysis, visualization, cheminformatics, and interfacing with other languages like R and Java. The document provides an overview of the capabilities of Python for scientific computing and highlights its growing adoption in the chemistry community.
This document summarizes a talk given by Dr. Noel O'Boyle on using Python for chemistry. It discusses what Python is, why it is useful for chemistry, and how it can be used. Specific examples are given of popular Python modules for tasks like data analysis, visualization, cheminformatics, and interfacing with other languages like R and Java. The document provides an overview of the capabilities of Python for scientific computing and highlights its growing adoption in the chemistry community.
|QAB> : Quantum Computing, AI and BlockchainKan Yuenyong
The document discusses quantum computing, artificial intelligence, and blockchain. It describes how quantum computers could crack encryption like RSA much faster than classical computers. However, building a quantum computer with enough qubits to run algorithms like Shor's algorithm is not currently possible. The document also discusses how quantum computing could be a solution to problems caused by quantum effects at small scales. Photonic quantum computers that operate at room temperature and can scale to millions of qubits are also mentioned.
The UK's fastest growing AI & Data career accelerator program can be summarized in 3 sentences:
The AiCore Programme offers software engineering, data science, data analysis, data engineering and machine learning specializations. It provides over 500 hours of coding experience, an internal job board for top industry roles, and support to become an irresistible candidate for your specialist career. Common professions for alumni include data analyst, data engineer, machine learning engineer, and software developer.
The Taverna Workflow Management Software Suite - Past, Present, FuturemyGrid team
The document summarizes the Taverna workflow management software. It discusses how Taverna allows users to visually design and execute workflows to analyze data through web services, scripts, and other tools. The summary highlights that Taverna uses a dataflow model and supports mixing different step types, nested workflows, and interactions. It also discusses how Taverna aims to advance scientific discovery by making workflows reusable, adaptive to different infrastructures, and able to process data at large scales.
I summarize requirements for an "Open Analytics Environment" (aka "the Cauldron"), and some work being performed at the University of Chicago and Argonne National Laboratory towards its realization.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
Transfer Learning for Performance Analysis of Machine Learning SystemsPooyan Jamshidi
This document discusses transfer learning approaches for analyzing the performance of machine learning systems. It begins with the presenter's background and credentials. It then notes that today's most popular systems are highly configurable, but understanding how configurations impact performance is challenging. The document uses a case study of a social media analytics system called SocialSensor to illustrate the opportunity of exploring different configurations to improve performance without extra resources. Testing various configurations of SocialSensor's data processing pipelines revealed that the default was suboptimal, and an optimal configuration found through experimentation significantly outperformed the default and an expert's recommendation. The document concludes that default configurations are often bad, but transfer learning approaches can help identify configurations that noticeably improve performance.
Sqrrl October Webinar: Data Modeling and IndexingSqrrl
This document summarizes a webinar about data modeling and indexing for Apache Accumulo using Sqrrl. It discusses Accumulo and Sqrrl technology, including table designs for dynamic documents, graphs and inverted indexes. It also describes how Sqrrl Enterprise allows building advanced indexes and the real-time operational applications it enables.
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
This document discusses developing a business model for ZooBank, a proposed online registry of zoological nomenclature. It outlines elements to consider for the business model, including the scientific, technical, social, and financial models. It also discusses how ZooBank could operate within the EDIT network to establish a prototype web taxonomy and help coordinate taxonomic data infrastructure. Funding opportunities that could support ZooBank are also mentioned.
This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review.
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
This document discusses using Schema.org to describe marine data and link ocean data on the web. It provides background on linked data and Schema.org. It describes work done by various organizations to apply Schema.org to describe datasets, organizations, projects, and other marine data. This includes developing schemas and cataloging various types of marine data. Future work is discussed, such as supporting tabular data and linking to other vocabularies for different data types.
This document discusses Hivemall, a machine learning library for Apache Hive and Spark. It was developed by Makoto Yui as a personal research project to make machine learning easier for SQL developers. Hivemall implements various machine learning algorithms like logistic regression, random forests, and factorization machines as user-defined functions (UDFs) for Hive, allowing machine learning tasks to be performed using SQL queries. It aims to simplify machine learning by abstracting it through the SQL interface and enabling parallel and interactive execution on Hadoop.
Achieve big data analytic platform with lambda architecture on cloudScott Miao
This document discusses achieving a big data analytic platform using the Lambda architecture on cloud infrastructure. It begins by explaining why moving to the cloud provides benefits like elastic scaling, reduced operational overhead, and increased focus on innovation. Common cloud services at Trend Micro like an analytic engine and cloud storage are then described. The document introduces the Lambda architecture and proposes a serving layer as a service. Key lessons learned from building big data solutions on AWS include the pros of unlimited scalability and easy disaster recovery compared to on-premises infrastructure.
SparkR - Play Spark Using R (20160909 HadoopCon)wqchen
1. Introduction to SparkR
2. Demo
Starting to use SparkR
DataFrames: dplyr style, SQL style
RDD v.s. DataFrames
SparkR on MLlib: GLM, K-means
3. User Case
Median: approxQuantile()
ID Match: dplyr style, SQL style, SparkR function
SparkR + Shiny
4. The Future of SparkR
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic StackLen Chang
This document proposes implementing a real-time centralized logging system using the Elastic Stack. It introduces Elastic Stack components like Filebeat, Elasticsearch, and Kibana. It then provides a use case of converting log timestamps to a standard sort format using Logstash filters like grok and date. The presenter works at WeMo Scooter, an electric scooter rental startup aiming to reduce emissions. He is interested in technologies like Elastic Stack, PostgreSQL, and Spark.
Logs are one of the most important sources to monitor and reveal some significant events of interest. In this presentation, we introduced an implementation of log streams processing architecture based on Apache Flink. With fluentd, different kinds of emitted logs are collected and sent to Kafka. After having processed by Flink, we try to build a dash board utilizing elasticsearch and kibana for visualization.
1. The document discusses best practices for scientific software development including writing code for people to read, automating repetitive tasks, using version control, and avoiding redundancy.
2. Specific approaches mentioned are planning for mistakes, automated testing, continuous integration, and using style guides to ensure code is readable and consistently formatted.
3. Knitting allows analyzing and reporting in a single file by embedding R code chunks in markdown documents.
1. The document discusses best practices for scientific software development, including writing code for people rather than computers, automating repetitive tasks, using version control, and conducting code reviews.
2. Specific approaches and tools recommended are planning for mistakes, automated testing, continuous integration, and using a coding style guide. R and Ruby style guides are provided as examples.
3. The benefits of following such practices are improving productivity, reducing errors, making code easier to read and maintain, and allowing scientists to focus on scientific questions rather than software issues. Reproducible and sustainable software is the overall goal.
This document summarizes a talk given by Dr. Noel O'Boyle on using Python for chemistry. It discusses what Python is, why it is useful for chemistry, and how it can be used. Specific examples are given of popular Python modules for tasks like data analysis, visualization, cheminformatics, and interfacing with other languages like R and Java. The document provides an overview of the capabilities of Python for scientific computing and highlights its growing adoption in the chemistry community.
This document summarizes a talk given by Dr. Noel O'Boyle on using Python for chemistry. It discusses what Python is, why it is useful for chemistry, and how it can be used. Specific examples are given of popular Python modules for tasks like data analysis, visualization, cheminformatics, and interfacing with other languages like R and Java. The document provides an overview of the capabilities of Python for scientific computing and highlights its growing adoption in the chemistry community.
|QAB> : Quantum Computing, AI and BlockchainKan Yuenyong
The document discusses quantum computing, artificial intelligence, and blockchain. It describes how quantum computers could crack encryption like RSA much faster than classical computers. However, building a quantum computer with enough qubits to run algorithms like Shor's algorithm is not currently possible. The document also discusses how quantum computing could be a solution to problems caused by quantum effects at small scales. Photonic quantum computers that operate at room temperature and can scale to millions of qubits are also mentioned.
The UK's fastest growing AI & Data career accelerator program can be summarized in 3 sentences:
The AiCore Programme offers software engineering, data science, data analysis, data engineering and machine learning specializations. It provides over 500 hours of coding experience, an internal job board for top industry roles, and support to become an irresistible candidate for your specialist career. Common professions for alumni include data analyst, data engineer, machine learning engineer, and software developer.
The Taverna Workflow Management Software Suite - Past, Present, FuturemyGrid team
The document summarizes the Taverna workflow management software. It discusses how Taverna allows users to visually design and execute workflows to analyze data through web services, scripts, and other tools. The summary highlights that Taverna uses a dataflow model and supports mixing different step types, nested workflows, and interactions. It also discusses how Taverna aims to advance scientific discovery by making workflows reusable, adaptive to different infrastructures, and able to process data at large scales.
I summarize requirements for an "Open Analytics Environment" (aka "the Cauldron"), and some work being performed at the University of Chicago and Argonne National Laboratory towards its realization.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
Transfer Learning for Performance Analysis of Machine Learning SystemsPooyan Jamshidi
This document discusses transfer learning approaches for analyzing the performance of machine learning systems. It begins with the presenter's background and credentials. It then notes that today's most popular systems are highly configurable, but understanding how configurations impact performance is challenging. The document uses a case study of a social media analytics system called SocialSensor to illustrate the opportunity of exploring different configurations to improve performance without extra resources. Testing various configurations of SocialSensor's data processing pipelines revealed that the default was suboptimal, and an optimal configuration found through experimentation significantly outperformed the default and an expert's recommendation. The document concludes that default configurations are often bad, but transfer learning approaches can help identify configurations that noticeably improve performance.
Sqrrl October Webinar: Data Modeling and IndexingSqrrl
This document summarizes a webinar about data modeling and indexing for Apache Accumulo using Sqrrl. It discusses Accumulo and Sqrrl technology, including table designs for dynamic documents, graphs and inverted indexes. It also describes how Sqrrl Enterprise allows building advanced indexes and the real-time operational applications it enables.
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
This document discusses developing a business model for ZooBank, a proposed online registry of zoological nomenclature. It outlines elements to consider for the business model, including the scientific, technical, social, and financial models. It also discusses how ZooBank could operate within the EDIT network to establish a prototype web taxonomy and help coordinate taxonomic data infrastructure. Funding opportunities that could support ZooBank are also mentioned.
This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review.
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
This document discusses using Schema.org to describe marine data and link ocean data on the web. It provides background on linked data and Schema.org. It describes work done by various organizations to apply Schema.org to describe datasets, organizations, projects, and other marine data. This includes developing schemas and cataloging various types of marine data. Future work is discussed, such as supporting tabular data and linking to other vocabularies for different data types.
D-REPR: A Language For Describing And Mapping Diversely-Structured Data Sourc...Binh Vu
This document describes D-REPR, a language for describing and mapping diversely structured data sources to RDF. It discusses the need for a uniform method to access heterogeneous datasets in different formats like CSV, JSON, and NetCDF. D-REPR is presented as a generic language that can map a wide variety of data sources through a declarative approach. It describes a 4-step process for mapping data with D-REPR: (1) defining resources, (2) defining attributes, (3) defining alignments between attributes, and (4) defining a semantic model. The language is extensible and an efficient engine is provided to convert datasets to RDF. Evaluation shows it can model datasets of various formats and outper
Announcing NamSorML : AI classifiers for race, ethnicity and migration studiesElian CARSENAT
NamSor ML is a new product that offers unmatched accuracy for processing 'big data' or open data sources in the context of race, ethnicity and migration studies. NamSor ML is a SDK (on-site software) that complements NamSor Origin and Diaspora APIs with enterprise class functionalities, for research institutes, international organizations, governements and the private sector.
This document provides a summary of best practices for scientific software development based on research and experience. It describes practices that can improve scientists' productivity and the reliability of their software, such as writing programs for people rather than computers, using version control, automating testing, and documenting work. Adopting these practices in concert can help reduce errors, make software easier to maintain, and save scientists time and effort.
Jogging While Driving, and Other Software Engineering Research Problems (invi...David Rosenblum
invited talk presented for the Distinguished Lecturer Series of the Department of Computer Science at the University of Illinois at Chicago, 10 April 2014
Similar to Hadoop con 2016_9_10_王經篤(Jing-Doo Wang) (20)
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
1. Potential Applications using the Class Frequency
Distribution of Maximal Repeats
from Tagged Sequential data.
Jing-Doo Wang (王經篤)
Associate Professor
Asia University, Taiwan.
第八屆台灣 Hadoop 社群年會 HadoopCon 2016
中央研究院人文社會科學館 (2016.9.10)
8. Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)
from Genomic Sequences
• Future Works
10. Why use “Maximal Repeats ”
as features?
• Dictionary
– How to identify new words or phrases?
– e.g. “just do it”, “洪荒之力”。
• N-gram
– 2-gram, 3-gram,…,5-grams. (Google Ngram viewer)
– The value of “N” is limited.
• Maximal Repeat
– The length of maximal repeat is variable.
13. Patent Application Serial Number
(US 15/208,994)( 申請中)
• Wang, Ching-Tu. Method for Extracting Maximal
Repeat Patterns and Computing Frequency
Distribution Tables. Patent Application Serial
Number 15/208,994. 13 July 2016.
• 申請美國發明專利PA
– 所有權:王經篤
– 發明人:王經篤
18. Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)
from Genomic Sequences
• Future Works
19. Pattern History for Trend Analysis
Jing-Doo Wang (王經篤)
Associate Professor
Asia University, Taiwan.
2016/9/12 19FSKD 20'11
Sequential Data + Timestamp
24. The Abstracts and Titles of PubMed
Articles (1990~2014)(12GB)
6 PCs=> 5 hours
25. The History of a Significant Pattern
顯要樣式歷史
The history of a significant pattern is the
frequency distribution of that pattern over
equally spaced time intervals.
25
26. Significant Pattern
(顯要樣式)
• A significant pattern is one maximal repeat of
consecutive words within texts.
26
(Length=1) TDP-43
(Length=1)SARS
(Length=1)H1N1
(Length=5)non-small cell lung cancer (NSCLC)
(Length=6)75 g oral glucose tolerance test
(Length=6)4 x 4 Latin square design
(Length=7)2 x 2 factorial arrangement of treatments
(Length=9)the National Institute of Child Health and Human Development
(Length=10)patients with squamous cell carcinoma of the head and neck
(Length=11)anomalous origin of the left coronary artery from the pulmonary artery
(Length=12)Pregnancy and Childbirth Group trials register and the Cochrane Controlled Trials Regist
(Length=13)the European Organization for Research and Treatment of Cancer Quality of Life Questi
37. Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)
from Genomic Sequences
• Future Works
92. It will be a hard work!
http://previews.123rf.com/images/dirkercken/dirkercken1208/dirkercken120800053/14852048-
hard-work-ahead-tough-job-be-ambitious-even-if-you-have-a-difficult-challenging-task-with-
impact-to--Stock-Photo.jpg
93. New Direction & Thinking!
http://switchandshift.com/11-trademarks-of-rebellious-leadership
105. Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)
from Genomic Sequences
• Future Works
128. Maximal Repeats appearing in
all of 24 human chromosomes.
• Length |Maximal Repeats| <= 500 bp
– Ok!
• Length |Maximal Repeats| <= 1000 bp
– Disk Space Full!
130. Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)
from Genomic Sequences
• Future Works