Recent Database Management Systems
Research Articles - September 2020
International Journal of Database Management Systems (IJDMS)
ISSN: 0975-5705 (Online); 0975-5985 (Print)
http://airccse.org/journal/ijdms/index.html
Contact us: ijdmsjournal@airccse.org
Recent Database Management Systems Research Articles - September 2020
1. Recent Database Management Systems
Research Articles - September 2020
International Journal of Database
Management Systems (IJDMS)
ISSN: 0975-5705 (Online); 0975-5985 (Print)
http://airccse.org/journal/ijdms/index.html
Contact Us: ijdmsjournal@airccse.org
2. A XGBOOST RISK MODEL VIA FEATURE SELECTION AND BAYESIAN HYPER-
PARAMETER OPTIMIZATION
Yan Wang1
and Xuelei Sherry Ni2
1
Graduate College, Kennesaw State University, Kennesaw, USA
2
Department of Statistics and Analytical Sciences, Kennesaw State University, Kennesaw, USA
ABSTRACT
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach
for business risk classification. Feature selection (FS) algorithms and hyper-parameter
optimizations are simultaneously considered during model training. The five most commonly
used FS methods including weight by Gini, weight by Chi-square, hierarchical variable
clustering, weight by correlation, and weight by information are applied to alleviate the effect of
redundant features. Two hyper-parameter optimization approaches, random search (RS) and
Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different
FS and hyper-parameter optimization methods on the model performance are investigated by the
Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally
utilized logistic regression (LR) model in terms of classification accuracy, area under the curve
(AUC), recall, and F1 score obtained from the 10-fold cross validation. Results show that
hierarchical clustering is the optimal FS method for LR while weight by Chi-square achieves the
best performance in XG-Boost. Both TPE and RS optimization in XGBoost outperform LR
significantly. TPE optimization shows a superiority over RS since it results in a significantly
higher accuracy and a marginally higher AUC, recall and F1 score. Furthermore, XGBoost with
TPE tuning shows a lower variability than the RS method. Finally, the ranking of feature
importance based on XGBoost enhances the model interpretation. Therefore, XGBoost with
Bayesian TPE hyper-parameter optimization serves as an operative while powerful approach for
business risk modeling.
KEYWORDS
Extreme gradient boosting; XGBoost; feature selection; Bayesian tree-structured Parzen
estimator; risk modeling
Full Text: https://aircconline.com/ijdms/V11N1/11119ijdms01.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.html
3. REFERENCES
[1] E. I. Altman and A. Saunders, "Credit risk measurement: Developments over the last 20
years,"Journal of banking &finance, vol. 21, no. 11-12, pp. 1721-1742, 1997.
[2] R. A. Walkling, "Predicting tender offer success: A logistic analysis,"Journal of financial and
Quantitative Analysis, vol. 20, no. 4, pp. 461-478, 1985.
[3] S. Finlay, "Multiple classifier architectures and their application to credit risk assessment,"
European Journal of Operational Research, vol. 210, no. 2, pp. 368-378, 2011.
[4] Y. Wang and J. L. Priestley, âBinary classification on past due of service accounts using
logistic regression and decision tree,â 2017.
[5] Y. Wang, X. S. Ni, and B. Stone, "A two-stage hybrid model by using artificial neural
networks as feature construction algorithms," arXiv preprint arXiv:1812.02546, 2018.
[6] Y. Zhou, M. Han, L. Liu, J.S. He, and Y. Wang, "Deep learning approach for cyberattack
detection,"IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops
(INFOCOM WKSHPS). IEEE, pp. 262-267, 2012.
[7] I. Brown and C. Mues, "An experimental comparison of classification algorithms for
imbalanced credit scoring data sets," Expert Systems with Applications, vol. 39, no. 3, pp. 3446-
3453, 2012.
[8] G. Paleologo, A. Elisseeff, and G. Antonini, "Subagging for credit scoring models,"
European journal of operational research, vol. 201, no. 2, pp. 490-499, 2010.
[9] G. Wang, J. Ma, L. Huang, and K. Xu, "Two credit scoring models based on dual strategy
ensemble trees," Knowledge-Based Systems, vol. 26, pp. 61-68, 2012.
[10] T. Chen, T. He, M. Benesty et al., "Xgboost: extreme gradient boosting," R pack-age
version 0.4-2, pp. 1{4, 2015.
[11] M. Zieba, S. K. Tomczak, and J. M. Tomczak, "Ensemble boosted trees with synthetic
features generation in application to bankruptcy prediction," Expert Systems with Applications,
vol. 58, pp. 93-101, 2016.
[12] Y. Xia, C. Liu, Y. Li, and N. Liu, "A boosted decision tree approach using Bayesian hyper-
parameter optimization for credit scoring," Expert Systems with Applications, vol. 78, pp. 225-
241, 2017.
[13] S. Piramuthu, "Evaluating feature selection methods for learning in data mining
applications," European journal of operational research, vol. 156, no. 2, pp. 483-494, 2004.
4. [14] J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," Journal of
Machine Learning Research, vol. 13, no. Feb, pp. 281-305, 2012.
[15] J. Bergstra, D. Yamins, and D. D. Cox, "Hyperopt: A python library for optimizing the
hyperparameters of machine learning algorithms," in Proceedings of the 12th Python in Science
Conference. Citeseer, 2013, pp. 13-20.
[16] F. N. Koutanaei, H. Sajedi, and M. Khanbabaei, "A hybrid data mining model of feature
selection algorithms and ensemble learning classifiers for credit scoring," Journal of Retailing
and Consumer Services, vol. 27, pp. 11-23, 2015.
[17] F. Akthar and C. Hahne, "RapidMiner 5 operator reference," Rapid-I GmbH, vol. 50, p. 65,
2012.
[18] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of
high dimensional data for data mining applications. ACM, 1998, vol. 27, no. 2.
[19] S. Lessmann, B. Baesens, H.-V. Seow, and L. C. Thomas, "Benchmarking state-of-the-art
classification algorithms for credit scoring: An update of research," European Journal of
Operational Research, vol. 247, no. 1, pp. 124-136, 2015.
[20] T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the
22nd acmsigkdd international conference on knowledge discovery and data mining. ACM, 2016,
pp. 785- 794.
[21] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
[22] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, "Algorithms for hyper-parameter
optimization," in Advances in neural information processing systems, 2011, pp. 2546-2554.
[23] J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, and D. D. Cox, "Hyperopt: a python library
for model selection and hyperparameter optimization," Computational Science & Discovery, vol.
8, no. 1, p. 014008, 2015.
[24] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, "Taking the human out
of the loop: A review of Bayesian optimization," Proceedings of the IEEE, vol. 104, no. 1, pp.
148-175, 2016.
[25] J. Bergstra, D. Yamins, and D. D. Cox, "Making a science of model search: Hyper-
parameter optimization in hundreds of dimensions for vision architectures," 2013.
[26] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, "Auto-weka: Combined
selection and hyperparameter optimization of classification algorithms," in Proceedings of the
19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,
2013, pp. 847-855.
5. AN INFECTIOUS DISEASE PREDICTION METHOD BASED ON K-NEAREST
NEIGHBOR IMPROVED ALGORITHM
Yaming Chen1
, Weiming Meng2
, Fenghua Zhang3
,Xinlu Wang4
and Qingtao Wu5
1,2&4
Computer Science and Technology, Henan University of Science and Technology, Luo
Yang, China
3
Computer Technology, Henan University of Science and Technology, Luo Yang, China
5
Professor, Henan University of Science and Technology, Luo Yang, China
ABSTRACT
With the continuous development of medical information construction, the potential value of a
large amount of medical information has not been exploited. Excavate a large number of medical
records of outpatients, and train to generate disease prediction models to assist doctors in
diagnosis and improve work efficiency.This paper proposes a disease prediction method based
on k-nearest neighbor improvement algorithm from the perspective of patient similarity analysis.
The method draws on the idea of clustering, extracts the samples near the center point generated
by the clustering, applies these samples as a new training sample set in the K-nearest neighbor
algorithm; based on the maximum entropy The K-nearest neighbor algorithm is improved to
overcome the influence of the weight coefficient in the traditional algorithm and improve the
accuracy of the algorithm. The real experimental data proves that the proposed k-nearest
neighbor improvement algorithm has better accuracy and operational efficiency.
KEYWORDS
Data Mining,KNN, Clustering,Maximum Entropy
Full Text: https://aircconline.com/ijdms/V11N1/11119ijdms02.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.html
6. REFERENCES
[1] Ian H. Witten, Eibe Frank, & Mark A. Hall.(2005).Data Mining: Practical Machine Learning
Tools and Techniques (Third Edition).
[2] Burges, C. J. C. . (1998). A tutorial on support vector machines for pattern recognition. Data
Mining & Knowledge Discovery, 2(2), 121-167.
[3] Rjeily, C. B., Badr, G., Hassani, A. H. E., & Andres, E. (2019). Medical Data Mining for
Heart Diseases and the Future of Sequential Mining in Medical Field. Machine Learning
Paradigms.
[4] Stiglic, G. , Brzan, P. P. , Fijacko, N. , Wang, F. , Delibasic, B. , & Kalousis, A. , et al.
(2015). Comprehensible predictive modeling using regularized logistic regression and
comorbidity based features. PLOS ONE, 10(12), e0144439.
[5] Nguyen, P. , Tran, T. , Wickramasinghe, N. , & Venkatesh, S. . (2016). Deepr: a
convolutional net for medical records.
[6] Choi, E. , Bahadori, M. T. , Kulas, J. A. , Schuetz, A. , Stewart, W. F. , & Sun, J. . (2016).
Retain: an interpretable predictive model for healthcare using reverse time attention mechanism.
[7] Hoogendoorn, M. , El Hassouni, A. , Mok, K. , Ghassemi, M. , & Szolovits, P. . (2016).
Prediction using patient comparison vs. modeling: a case study for mortality prediction. Conf
Proc IEEE Eng Med Biol Soc, 2016, 2464-2467.
[8] Sharafoddini, A. , Dubin, J. A. , & Lee, J. . (2017). Patient similarity in prediction models
based on health data: a scoping review. Jmir Med Inform, 5(1), e7.
[9] Zhang, P. , Wang, F. , Hu, J. , & Sorrentino, R. . (2014). Towards personalized medicine:
leveraging patient similarity and drug similarity analytics. Amia Jt Summits Transl Sci Proc,
2014, 132-136.
[10] Ng, K. , Sun, J. , Hu, J. , & Wang, F. . (2015). Personalized predictive modeling and risk
factor identification using patient similarity. Amia Jt Summits Transl Sci Proc, 2015, 132-136.
[11] Sherry-Ann, B. . (2016). Patient similarity: emerging concepts in systems and precision
medicine. Frontiers in Physiology, 7.
[12] Jiang, L. , Cai, Z. , Wang, D. , & Zhang, H. . (2014). Bayesian citation-knn with distance
weighting. International Journal of Machine Learning & Cybernetics, 5(2), 193-199.
[13] Islam, M. M., Iqbal, H., Haque, M. R., & Hasan, M. K. (2018). Prediction of breast cancer
using support vector machine and K-Nearest neighbors. IEEE Region 10 Humanitarian
Technology Conference.
7. [14] Maillo, J. , RamĂrez, Sergio, Triguero, I. , & Herrera, F. . (2016). Knn-is: an iterative spark-
based design of the k-nearest neighbors classifier for big data. Knowledge-Based Systems,
S0950705116301757.
[15] Han, J. . (2005). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc.
[16] Zhang, X. , Huang, X. , & Wang, F. . (2017). The construction of undergraduate data
mining course in the big data age. International Conference on Computer Science & Education.
IEEE.
[17] Holzinger, A., & Jurisica, I. (2014). Knowledge discovery and data mining in biomedical
informatics: the future is in integrative, interactive machine learning solutions.
[18] Qu Fang,& Guo Hua.(2017). "Internet + big data" pension path to achieve. Science &
Technology Review, , 35(16): 84-90.
[19] Pan, T. L. , Sumalee, A. , Zhong, R. X. , & Indra-Payoong, N. . (2013). Short-term traffic
state prediction based on temporalâspatial correlation. IEEE Transactions on Intelligent
Transportation Systems, 14(3), 1242-1254.
[20] Wright, J. , Yang, A. Y. , Ganesh, A. , Sastry, S. S. , & Ma, Y. . (2009). Robust face
recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 31(2), 210- 227.
8. A THEORETICAL EXPLORATION OF DATA MANAGEMENT AND INTEGRATION
IN ORGANISATION SECTORS
Chisom E. Offia and Malcolm Crowe
School of Computing, Engineering and Physical Sciences, University of the West of Scotland,
Paisley, Scotland
ABSTRACT
Big data development is a disturbing issue that will affect enterprise across various sectors. The
increase of data volume, high speed of data generation and increasing rate of different data from
heterogeneous sources have led to difficulties in data management. This paper first reviews
different aspects of big data management, including data integration and traditional data
warehouse, and their associated challenges. The problems include increase of redundant data,
data accessibility, time consumption in data modelling and data movement from heterogeneous
sources into a central database, especially in the big data environment. We then propose a logical
data management approach using RESTview technology to integrate and analyse data, without
fully adopting traditional ETL processes. Data that for governance, corporate, security or other
restriction reasons cannot be copied or moved, can easily be accessed, integrated and analysed,
without creating a central repository. Data can be kept in its original form and location,
eliminating the movement of data, significantly speeding up the process and allowing for live
data interrogation. It may not be the practical solution for every situation but, it is a feasible
solution that is comparably cost effective.
KEYWORDS
Big Data; Data Integration; Data warehouse; RESTView
Full Text: https://aircconline.com/ijdms/V11N1/11119ijdms03.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.html
9. REFERENCES
[1] Bansal, S. and Kagemann, S. (2015). Integrating Big Data: A Semantic Extract-Transform-
Load Framework. Computer, 48(3), pp.42-50.
[2] Saxena, S. and Kumar Sharma, S. (2016). Integrating Big Data in âe-Omanâ: opportunities
and challenges. info, 18(5), pp.79-97.
[3] Khan, M., Wu, X., Xu, X. and Dou, W. (2017). Big data challenges and opportunities in the
hype of Industry 4.0. 2017 IEEE International Conference on Communications (ICC).
[4] Karafiloski, E. and Mishev, A. (2017). Blockchain solutions for big data challenges: A
literature review. IEEE EUROCON 2017 -17th International Conference on Smart Technologies.
[5] Crowe, M., Begg, C., Laiho, M. and Lau, F. (2016). Data validation for Big Live
data.[online] Available at:
https://www.researchgate.net/publication/315686427_Data_Validation_for_Big_Live_Data
[6] Chauhan, S., Agarwal, N. and Kar, A. (2016). Addressing big data challenges in smart cities:
a systematic literature review. info, 18(4), pp.73-90.
[7] Al Nuaimi, E., Al Neyadi, H., Mohamed, N. and Al-Jaroodi, J. (2015). Applications of big
data to smart cities. Journal of Internet Services and Applications, 6(1).
[8] Rabiul, I., Islam, R., Musfiqur, R. and Abiduzzaman, R. (2016). Big Data Characteristics,
Value Chain and Challenges.
[9] Almeida, F. and Calistru, C. (2013). The main challenges and issues of big data management.
International Journal of Research Studies in Computing, 2(1).
[10] Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. (2011).
Big data: the next frontier for innovation, competition, and productivity. McKinsey Global
Institute Reports, 5, 15-36.
[11] Sivarajah, U., Kamal, M., Irani, Z. and Weerakkody, V. (2017). Critical analysis of Big
Data challenges and analytical methods. Journal of Business Research, 70, pp.263-286.
[12] De Oliveira Veras, A., de SĂĄ, P., da Costa Pinheiro, K., Barh, D., Azevedo, V., JucĂĄ Ramos,
R. and da Costa da Silva, A. (2018). Computational Techniques in Data Integration and Big Data
Handling in Omics. Omics Technologies and Bio-Engineering, pp.209-222.
[13] Mishra, S., Dhote, V., S. Prajapati, G. and Shukla, J. (2015). Challenges in Big Data
Application: A Review. International Journal of Computer Applications, 121(19), pp.42-46.
[14] TOLE, A. (2013). Big Data Challenges. Database Systems Journal, vol. IV, p.no. 3.
10. [15] Trifu, M. and Ivan, M. (2014). Big Data: present and future. Database Systems Journal,
5(1), pp.32- 41.
[16] Mehmood, R., Meriton, R., Graham, G., Hennelly, P. and Kumar, M. (2017). Exploring the
influence of big data on city transport operations: a Markovian approach. International Journal of
Operations & Production Management, 37(1), pp.75-104.
[17] CEBR (2012). Data equity: unlocking the value of big data. Centre for Economics and
Business Research White Paper, 4, 7-26.
[18] McNulty, E. and Freeman, H. (2014). Understanding Big Data: The Seven V's -
Dataconomy. [online] Dataconomy. Available at: http://dataconomy.com/2014/05/seven-vs-big-
data/.
[19] Chaudhuri S. What next?: a half-dozen data management research goals for big data and the
cloud. In Proceedings of the 31st symposium on Principles of Database Systems. ACM; 2012.
pp. 1â4.
[20] Lenzerini, M. (2002). Data integration. Proceedings of the twenty-first ACM SIGMOD-
SIGACTSIGART symposium on Principles of database systems - PODS '02.
11. HYBRID ENCRYPTION ALGORITHMS FOR MEDICAL DATA STORAGE
SECURITY IN CLOUD DATABASE
Fenghua Zhang1
ďźYaming Chen2
ďźWeiming Meng3
and Qingtao Wu4
1
Computer Technology, Henan University of Science and Technology, Luo Yang, China
2,3
Computer Science and Technology, Henan University of Science and Technology, Luo Yang,
China
4
Professor, Henan University of Science and Technology, Luo Yang, China
ABSTRACT
Cloud database are derivatives of Cloud computing. At present, most medical institutions store
data in cloud database. Although the cloud database improves the efficiency of use, it also poses
a huge impact and challenge to the secure storage of data. The article proposes a hybrid
algorithm to solve the data security problem in the hospital cloud database. First, the AES
algorithm is improved. The improved algorithm is called P-AES algorithm. The P-AES
algorithm is then combined with the RSA algorithm, called a hybrid algorithm. The experimental
results show that the hybrid encryption algorithm has the advantages of fast encryption and
decryption speed, high security, good processing ability for longer data, and can solve the data
security problem in cloud database to a certain extent. The algorithm has been successfully
applied to hospital information management system.
KEYWORDS
Hospital System, AES, RSA, Hybrid Encryption
Full Text: https://aircconline.com/ijdms/V11N1/11119ijdms04.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.html
12. REFERENCES
[1] Wang Cong, Wang Qian & Ren Kui, (2011) âEnsuring data storage security in Cloud
Computingâ, IEEE International Conference on Parallel Distributed & Grid Computing.
[2] Salma & T. J., (2013) âA flexible distributed storage integrity auditing mechanism in Cloud
Computingâ, IEEE International Conference on Information Communication & Embedded
Systems.
[3] S.Suganya, (2014) âEnhancing security for storage services in cloud computingâ, IEEE
Current Trends in Engineering and Technology, Vol. 3, No. 6, pp283-287.
[4] Kaufman & L. M., (2009) âData Security in the world of cloud computingâ, IEEE Security &
Privacy, Vol. 4, No. 7, pp61-64.
[5] Takabi H, Joshi J. B. D. & Ahn G, (2010) âSecurity and privacy challenges in cloud
computing Environmentâ, IEEE Security & Privacy, Vol. 6, No. 8, pp24-31.
[6] Poh, G. S., Chin, J. J. & Yau, W. C., (2017) âSearchable symmetric encryptionâ, ACM
Computing Surveys, Vol. 50, No. 3, pp1-37.
[7] Qiu Weixing, Xiao Kezhi & Li Fang, (2011) âA kind of method of extension of the DES
key," Computer Engineering, Vol. 5, No. 37, pp167-168.
[8] T. Good & M. Benaissa, (2007) âPipelined AES on FPGA with support for feedback modes
(in a multi-channel environment)â, IET INFORMATION SECURITY, Vol. 1, No. 1, pp1-10.
[9] Mondal, S. & Maitra, S., (2014) âData security-modified aes algorithm and its applicationsâ,
ACM SIGARCH Computer Architecture News, Vol. 2, No. 42, pp1-8.
[10] Elbadawy & A. M., (2010) âA new chaos Advanced Encryption Standard (AES) algorithm
for data securityâ, IEEE International Conference on Signals & Electronic Systems, pp7-10.
[11] Babitha M. P. & Babu K. R. R., (2017) âSecure cloud storage using AES encryptionâ, IEEE
International Conference on Automatic Control & Dynamic Optimization Techniques.
[12] Chen, Y, & Li K, (2017) âImplementation and Optimization of AES Algorithm on the
Sunway TaihuLightâ, IEEE International Conference on Parallel & Distributed Computing.
Pp256-261.
[13] Priya, S. S. S., & Karthigaikumar, P., (2015) âGeneration of 128-Bit Blended Key for AES
Algorithmâ, CSI Emerging ICT for Bridging the Future , Vol. 2, No. 49, pp431-439.
[14] Mohurle, M., & V. V. Panchbhai, (2017) âReview on realization of AES encryption and
decryption with power and area optimizationâ, IEEE International Conference on Power
Electronics, Vol. 2.
13. [15] N. S. Sai Srinivas & Monhammed Akramuddin, (2016) âFPGA Based Hardware
Implementation of AES Rijndael Algorithm for Encryption and Decryptionâ, IEEE International
Conference on Electrical, Electronics and Optimization Techniques (ICEEOT-2016).
[16] Puneet Kumar & Shashi B. Rana, (2016) âDevelopment of modified aes algorithm for data
securityâ, Optik - International Journal for Light and Electron Optics, Vol. 4, No. 127, pp2341-
2345.
[17] Rivest, R. L., Shamir, A., & Adleman, L., (1978) âA method for obtaining digital signatures
and public-key cryptosystemsâ, Communications of the Acm, Vol. 2, No. 21, pp120-126.
[18] Anane Nadjia & Anane Mohamed, (2015) âAES IP for hybrid cryptosystem RSA-AESâ,
IEEE International Multi-conference on Systems. No. 12.
[19] HU Zhen, (2012) âTriple DES and RSA-based file encryption systemâ, Computer and
Modernization, No. 9, pp101-105.
[20] Jian Zhang & Xuling Jin, (2012) âEncryption System Design Based on DES and SHA-1â,
IEEE International Symposium on Distributed Computing & Applications to Business, pp317-
320.
14. BRIDGING DATA SILOS USING BIG DATA INTEGRATION
Jayesh Patel
Senior Member
ABSTRACT
With cloud computing, cheap storage and technology advancements, an enterprise uses multiple
applications to operate business functions. Applications are not limited to just transactions,
customer service, sales, finance but they also include security, application logs, marketing,
engineering, operations, HR and many more. Each business vertical uses multiple applications
which generate a huge amount of data. On top of that, social media, IoT sensors, SaaS solutions,
and mobile applications record exponential growth in data volume. In almost all enterprises, data
silos exist through these applications. These applications can produce structured, semi-
structured, or unstructured data at different velocity and in different volume. Having all data
sources integrated and generating timely insights helps in overall decision making. With recent
development in Big Data Integration, data silos can be managed better and it can generate
tremendous value for enterprises. Big data integration offers flexibility, speed and scalability for
integrating large data sources. It also offers tools to generate analytical insights which can help
stakeholders to make effective decisions. This paper presents the overview on data silos,
challenges with data silos and how big data integration can help to stun them.
KEYWORDS
Data Silo, Big Data, Data Pipelines, Integration, Data Lake, Hadoop
Full Text: https://aircconline.com/ijdms/V11N3/11319ijdms01.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.htm
15. REFERENCES
[1] David Loshin, âBig Data Analyticsâ, Elsevier, 2013
[2] Xin Luna Dong, Divesh Srivastava, 2013. âBig Data Integrationâ, ICDE conference 2013
[3] Sachchidanand Singh, Nirmala Singh, 2012. âBig Data Analyticsâ, International Conference
on Communication, Information & Computing Technology (ICCICT), Oct 19-20, 2012.
[4] P. Bedi, V. Jindal, and A. Gautam, âBeginning with Big Data Simplified,â 2014.
[5] D. L. W.H. Inmon, Data Architecture: A Primer for the Data Scientist: Big Data, Data
Warehouse and Data Vault. Amsterdam, Boston: Elsevier, 2014.
[6] J. G. Shanahan and L. Dai, âLarge Scale Distributed Data Science Using Apache Spark,â in
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 2015, pp. 2323â2324.
[7] A White Paper, 2013. âAggregation and analytics on Big Data using the Hadoop eco-
systemâ
[8] Comfort LK. Risk, Security, and Disaster Management. Annual Review of Political Science.
2005;8:335â356.
[9] eWEEK. (2019). Managing Massive Unstructured Data Troves: 10 Best Practices. [online]
Available at: http://www.eweek.com/storage/slideshows/managing-massiveunstructured-data-
troves-10-bestpractices#sthash.KAbEigHX.dpuf [Accessed 11 May 2019].
[10] Soumysen, Ranak Ghosh, Debanjali, NabenduChaki, 2012. âIntegrating XML Data into
Multiple ROLAP Data Warehouse Schemasâ, International Journal of Software Engineering and
Application (USEA), Vol 3, No.1, Jan 2012.
[11] B.arputhamary and L.arockiam. âA Review on Big Data Integrationâ IJCA Proceedings on
International Conference on Advanced Computing and Communication Techniques for High
Performance Applications ICACCTHPA 2014(5):21-26, February 2015.
[12] J. Kreps, N. Narkhede, and J. J. Rao, ââKafka: A distributed messaging system for log
processing,ââ in Proc. NetDB, 2011, pp. 1â7.
[13] J. Liao, X. Zhuang, R. Fan and X. Peng, "Toward a General Distributed Messaging
Framework for Online Transaction Processing Applications," in IEEE Access, vol. 5, pp. 18166-
18178, 2017.
[14] Salinas, Sonia Ordonez and Alba C.N. Lemus. (2017) âData Warehouse and Big Data
integrationâ Int. Journal of Comp. Sci. and Inf. Tech. 9(2): 1-17.
16. [15] Analytics Magazine, 03-Nov-2016. âData Lakes: The biggest big data challenges,â
[Online]. Available at: http://analytics-magazine.org/data-lakes-biggest-big-data-challenges/.
[Accessed: 11- May-2019].
[16] Alienor. "What Is a Data Silo and Why Is It Bad for Your Organization?" Plixer. July 31,
2018. Accessed May 11, 2019. https://www.plixer.com/blog/network-security/data-silo-what-is-
it-why-is-itbad/.
[17] "4 Best Ways To Breakdown Data Silos [Problems and Solutions]." Status Guides. February
26, 2019. Accessed May 11, 2019. https://status.net/articles/data-silos-information-silos/.
[18] G. Press, âCleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task,
Survey Says,â Forbes, 23-Mar-2016. [Online]. Available:
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-
leastenjoyable-data-science-task-survey-says/#6d5fd596f637. [Accessed: 11-May-2019].
[19] AmitkumarManekar, and Dr. G. Pradeepinib (2015,May), âA Review On Cloud Based Data
Analysisâ. International Journal on Computer Network And Communications (IJCNC) May
2015,Vol.1 No.1
[20] L. Duggan, J. Dowzard, J. Katupitiya, and K. C. Chan, âA Rapid Deployment Big Data
Computing Platform for Cloud Robotics,â International journal of Computer Networks &
Communications, vol. 9, no. 6, pp. 77â88, 2017.
[21] Torlone, Riccardo. (2008). Two approaches to the integration of heterogeneous data
warehouses. Distributed and Parallel Databases. 23. 69-97. 10.1007/s10619-007-7022-z.
17. AUTHORIZATION FRAMEWORK FOR MEDICAL DATA
Geetha Madadevaiah1 , RV Prasad1
, Amogh Hiremath1
, Michel Dumontier2
, Andre Dekker3
1
Philips Research, Philips Innovation Campus, Philips India Ltd, Manyata Tech Park, Bangalore
2
Institute of Data Science, Maastricht University, Maastricht, The Netherlands
3
Department of Radiation Oncology (MAASTRO), GROW School for Oncology and
Developmental Biology, Maastricht University Medical Centre+, Dr Tanslaan 12, 6229ET,
Maastricht, The Netherlands
ABSTRACT
In this paper, the authors describe an approach for sharing sensitive medical data with the
consent of the data owner. The framework builds on the advantages of the Semantic Web
technologies and makes it secure and robust for sharing sensitive information in a controlled
environment. The framework uses a combination of Role-Based and Rule-Based Access Policies
to provide security to a medical data repository as per the FAIR guidelines. A lightweight
ontologywas developed, to collect consent from the users indicating which part of their data they
want to share with another user having a particular role. Here, the authors have considered the
scenario of sharing the medical data by the owner of data, say the patient, with relevant persons
such as physicians, researchers, pharmacist, etc. To prove this concept, the authors developed a
prototype and validated using the Sesame OpenRDF Workbench with 202,908 triples and a
consent graph stating consents per patient.
KEYWORDS
Access Policies, Semantic Web, RDF/SPARQL, Role Based, Rule Based, FAIR, Consent
Full Text: https://aircconline.com/ijdms/V11N3/11319ijdms02.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.html
18. REFERENCES
[1] Maddox TM, Albert NM, Borden WB, Curtis LH, Ferguson TB, Kao DP, et al. The Learning
Healthcare System and Cardiovascular Care: A Scientific Statement From the American Heart
Association. Circulation. 2017;135:e826â57.
[2] Lambin P, Roelofs E, Reymen B, Velazquez ER, Buijsen J, Zegers CML, et al. âRapid
Learning health care in oncologyâ â An approach towards decision support systems enabling
customised radiotherapyâ. Radiother. Oncol. 2013;109:159â64
[3] Wilkinson MD, Dumontier M, AalbersbergIjJ, Appleton G, Axton M, Baak A, et al. The
FAIR Guiding Principles for scientific data management and stewardship. Sci. Data.
2016;3:160018.
[4] Sullivan R, Peppercorn J, Sikora K, Zalcberg J, Meropol NJ, Amir E, et al. Delivering
affordable cancer care in high-income countries. Lancet Oncol. 2011;12:933â80.
[5] Regulation GDP. Regulation (EU) 2016/679 of the European Parliament and of the Council
of 27 April 2016 on the protection of natural persons with regard to the processing of personal
data and on the free movement of such data, and repealing Directive 95/46. Off. J. Eur. Union
OJ. 2016;59:1â88.
[6] Standards for privacy of individually identifiable health information. Office of the Assistant
Secretary for Planning and Evaluation, DHHS. Final rule. Fed. Regist. 2000;65:82462â829.
[7] Automatable Discovery and Access Matrix (âADA-Mâ)v1.0-Guidance
Document[Internet].Global Alliance for Genomics & Health (GA4GH) and International Rare
Disease Research Consortium (IRDiRC);
http://genomicsandhealth.org/files/public/ADAM_GuidanceDocument_15Dec2016_Final.pdf
[8] Wilkinson MD, Verborgh R, Bonino da Silva Santos LO, Clark T, Swertz MA, Kelpin FDL,
et al. Interoperability and FAIRness through a novel combination of Web technologies.
PeerJComput. Sci. 2017;3:e110.
[9] Jochems A, Deist TM, Soest J van, Eble M, Bulens P, Coucke P, et al. Distributed learning:
Developing a predictive model based on data from multiple hospitals without data leaving the
hospital â A real life proof of concept. Radiother. Oncol. 2016;121:459â67.
[10]RDF Primer [Internet]. [cited 2017 Jun 8]. Available from:
https://www.w3.org/TR/2004/REC-rdfprimer-20040210/
[11]Decker S, Mitra P, Melnik S. Framework for the semantic Web: an RDF tutorial. IEEE
Internet Comput. 2000;4:68â73.
19. [12]SPARQL Query Language for RDF [Internet]. [cited 2017 Jun 8]. Available from:
https://www.w3.org/TR/rdf-sparql-query/
[13]Arenas M, PĂŠrez J. Querying semantic web data with SPARQL. Proc. Thirtieth ACM
SIGMODSIGACT-SIGART Symp. Princ. Database Syst. [Internet]. ACM; 2011 [cited 2017 Jun
8]. p. 305â316. Available from: http://dl.acm.org/citation.cfm?id=1989312
[14]OWL 2 Web Ontology Language Primer (Second Edition) [Internet]. [cited 2017 Jun 9].
Available from: https://www.w3.org/TR/owl2-primer/
[15]Jonquet C, Shah N, Youn C, Callendar C, Storey M-A, Musen M. NCBO annotator:
semantic annotation of biomedical data. Int. Semantic Web Conf. Poster Demo Sess. [Internet].
2009 [cited 2017 Jun 8]. Available from:
http://www.lirmm.fr/~jonquet/publications/documents/Demo-ISWC09- Jonquet.pdf
[16]Berners-Lee T, Hendler J. Publishing on the semantic web. Nature. 2001;410:1023â4.
[17]Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, et al. DBpedia-A
crystallization point for the Web of Data. Web Semant. Sci. Serv. Agents World Wide Web.
2009;7:154â165
[18] Finin T, Joshi A, Kagal L, Niu J, Sandhu R, Winsborough W, et al. R OWL BAC:
representing role based access control in OWL. Proc. 13th ACM Symp. Access Control Models
Tec Symp. Access International Journal of Database Management Systems (IJDMS ) Vol.11,
No.2/3, June 2019 22 Control Models Technol. [Internet]. ACM; 2008 [cited 2017 Jun 8]. p. 73â
82. Available from: http://dl.acm.org/citation.cfm?id=1377849hnol. [Internet]. ACM; 2008
[cited 2017 Jun 8]. p. 73â82. Available from: http://dl.acm.org/citation.cfm?id=1377849
[19]Gabillon A, Letouzey L. A View Based Access Control Model for SPARQL. IEEE; 2010
[cited 2017 Jun 8]. p. 105â12. Available from: http://ieeexplore.ieee.org/document/5636084/
[20]Sacco O, Passant A. A Privacy Preference Ontology (PPO) for Linked Data. LDOW
[Internet]. 2011 [cited 2017 Jun 8]. Available from:
http://www.academia.edu/download/6230668/ldow2011- paper01.pdf
20. IMAGE STEGANOGRAPHY USING INHOMOGENEOUS IMAGES WITH
MODYFING VERNAM SCHEME
Huda H.Al.ghuraify1
, Dr.Ali A.Al-bakry2
, Dr. Ahmad T. Al-jayashi3
1
Engineering technical college,Al-furat Al-awsat university, Iraq
2
Dean of engineering technical college,Al-furat al-awsat university, Iraq
3
Assistance dean of engineering technical college,Al-furat al-awsat university, Iraq
ABSTRACT
Nowadays, due to the rapid development of the internet, it is prominent to guard mystery data
from cyberpunks through communicating. The steganography technique utilizes for trading
mystery data in an approach to stay away from doubt. This paper accomplishes a manner for
encryption each channel of RGB color image separately without the necessity to exchange an
encryption key utilizing the principle of modifying vernam scheme then camouflage it into a
grayscale cover image .On the other hand, encrypts a grayscale image without the necessity to
exchange an encryption key utilizing the principle of modifying vernam scheme then camouflage
it into a cover image of RGB color type . The simulation results revealed an offering of
extremely security for the image transmission.
KEYWORDS
Image steganography,Inhomogeneous images, Mystery data, Vernam scheme, Image
transmission
Full Text: https://aircconline.com/ijdms/V11N4/11419ijdms01.pdf
The International Journal of Database Management Systems (IJDMS)
http://airccse.org/journal/ijdms/current2019.html
21. REFERENCES
[1] Rejani. R, D. Murugan&D.V.Krishnan, (2015) "Comparative Study of Spatial Domain Image
Steganography Techniques," Int. J. Advanced Networking and Applications, vol. 07, no. 02, pp.
2650-2657.
[2] J. Singh,M. K. Garcha&G. Kaur,(2015) "Review of Spatial and Frequency Domain
Steganographic Approaches," International Journal of Engineering Research & Technology, vol.
4, no. 06, pp. 1122- 1125.
[3] M. Hussain, A.W.A. Wahab, Y.I.B. Idris&A.T.S. Ho, K.-H. Jung, (2018) "Image
Steganography in Spatial Domain : A Survey," Signal Processing: Image
Communication,Elsevier.
[4] H.A. Prajapati and N. G. Chitaliya, (2015) "Secured and Robust Dual Image Steganography :
A Survey," International Journal of Innovative Research in Computer and Communication
Engineering, vol. 03, no. 1, pp. 30-37.
[5] T. Morkel , J.H.P. Eloff &M.S. Olivier, (2005) "AN OVERVIEW OF IMAGE
STEGANOGRAPHY," Information and Computer Security Architecture (ICSA) Research
Group, pp. 1-11.
[6] N. Hamid,A.Yahya ,R. B. Ahmad,&O. M. Al-Qershi , (2012) "Image Steganography
Techniques: An Overview," International Journal of Computer Science and Security (IJCSS),
vol. 6, no. 3, pp. 168-187.
[7] Z. Khan&A. Bin Mansoor, (2009) . "Evaluation of Wavelet Filters Performance for
Steganalysis," 2nd International Conference on Computer, Control and Communication, IEEE,
pp. 1-5.
[8] Shikha&V. K. Dutt,(2014) "Steganography: The Art of Hiding Text in Image using Matlab,"
International Journal of Advanced Research in Computer Science and Software Engineering, vol.
4, no. 9, pp. 822-828.
[9] D. Rawat and V. Bhandari, "A Steganography Technique for Hiding Image in an Image using
LSB Method for 24 Bit Color Image," International Journal of Computer Applications, vol. 64,
no. 20, pp. 16-19, 2013.
[10] N.Tiwari, M.Sandilya,&M. Chawla, (2014) "Spatial Domain Image Steganography based
on Security and Randomization," International Journal of Advanced Computer Science and
Applications, vol. 5, no. 1, pp. 156-159.
[11] P.Das, S. C. Kushwaha ,&M. Chakraborty ,(2015) "MULTIPLE EMBEDDING SECRET
KEY IMAGE STEGANOGRAPHY USING LSB SUBSTITUTION AND ARNOLD
22. TRANSFORM," IEEE SPONSORED 2ND INTERNATIONAL CONFERENCE ON
ELECTRONICS AND COMMUNICATION SYSTEM, pp. 845-849.
[12] R. K.Thakur, Ch. Saravanan,(2016) "Analysis of Steganography with Various Bits of LSB
for Color Images," International Conference on Electrical, Electronics, and Optimization
Techniques,IEEE, pp. 2154-2158.
[13] P. MATHUR, &S. ADHIKARI, ( 2017) "DATA HIDING IN DIGITAL IMAGES USING
STAGNOGRAPHY PARADIGM: STATE OF THE ART," International Journal of Advances in
Electronics and Computer Science, vol. 4, no. 2, pp. 98-102.
[14] Ch. A. Sari, E.H. Rachmawanto, & E. J. Kusuma, (2019) "GOOD PERFORMANCE
IMAGES ENCRYPTION USING SELECTIVE BIT T-DES ON INVERTED LSB
STEGANOGRAPHY," Journal of a Science and Information, vol. 12, no. 1, pp. 41-49.
[15] H. H. Al Ghuraify, A.A. Al-Bakry, &A.T. Al-Jayashi, (2019) "QUATERNÄ°ON SECURÄ°TY
USÄ°NG MODÄ°FYÄ°NG VERNAM CÄ°PHER WÄ°TH Ä°MAGE STEGANOGRAPHY," The
International Journal of Multimedia & Its Applications, vol. 11, no. 3, pp. 1-20.
[16] H. H. Al Ghuraify, A.A. Al-Bakry, & Ahmad T. Al-Jayashi, (2019) "DUAL SECURITY
USING IMAGE STEGANOGRAPHY," International Journal of Network Security & Its
Applications (IJNSA), vol. 11, no. 2, pp. 14-31.
[17] E.J. Kusuma, Ch. A.Sari, E. H. Rachmawanto,and D. R. I.M. Setiadi, (2018) "A
Combination of Inverted LSB, RSA, and Arnold Transformation to get Secure and Imperceptible
Image Steganography," J. ICT Res. Appl., vol. 12, no. 2, pp. 103-119.
[18] H. Ogras,(2019) "An Efficient Steganography Technique for Images using Chaotic
Bitstream," I. J. Computer Network and Information Security, vol. 2, pp. 21-27.
[19] S. Namasudra & G. Ch.Deka, (2019) "Advances of DNA Computing in Cryptography",
Taylor&francis Group.
[20] "SIPI Image Database," [Online].