Review of big data analytics (bda) architecture trends and analysis

1. XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Review of Big Data Analytics (BDA) Architecture: Trends and Analysis Keh Kok Yong, Mohamad Syazwan Shafei, Pek Yin Sian, Meng Wei Chua Accelerative Technology Lab Mimos Berhad Kuala Lumpur, Malaysia kk.yong@mimos.my, syazwan.shafei@mimos.my, py.sian@mimos.my, mw.chua@mimos.my Abstract—The problem of constructing a big data analytics capabilities system, it is not only ingesting large volume of data, but also simultaneously computes vast volume and variety types of data, which is driven by required analytics use cases. This affects data architects, data engineers and data scientists in an organization to discover insight of data and turn to value. The promise of adopting a big data strategic architecture, it maximizes technologies capabilities in automate decision making and drives these values through innovation. A successful solution adopts a highly flexible and scalable architectural design to best-fit the organization BDA system. This paper surveys and discusses BDA and its architecture for applying the appropriate technologies. Keywords—Big Data Analytics, BDA, AI, IOT, Data Management I. INTRODUCTION Big Data Analytics is neither a novel nor unique phenomenon. It has been a long evolution of capturing and processing the collected data. During ancient times, human has developed methods to keep the results of those calculations in some kind of permanent format. Data analysis is rooted in statistics, ancient Egypt uses it for building the pyramid. There are many big data foundations has been built on and were laid long ago. Both of BDA and High-Performance Computing (HPC) has a similarity characteristic in distributing tasks across many servers. They share the common objectives in optimizing algorithms, executing parallel processes efficiently, automating computation, and building high performance networks. HPC adopts Message Passing Interface (MPI), OpenMP, Partitioned Global Address Space (PGAS), OpenSHMEM, Lustre, GlusterFS and others high performance technologies [1]. On the other hand, BDA is materializing through a combination of techniques and methods from one workflow with the other. It is not only to run a forecasting model, there have to collect the data, adjust into an acceptable model, execute the model and visualize the results. Hadoop has introduced in early 2000s, as an open source distributed framework. A series of papers have been published in describing innovations in systems for producing reliable storage (Google File System), processing in MapReduce, and low-latency random-access queries (Bigtable) on hundreds to thousands of potentially unreliable servers [2]. For deciding best fit BDA architecture, it involves identifying sources, features and analytics use cases. The general concept of big data is to extract insights, correlations and value from data. It starts with three Vs definition of big data; Volume, Variety and Velocity. Subsequently, it added Veracity and Value. These “Vs” is attempted to formalize the definition of the big aspect of this phenomenon. 1 BCE refer to Before the Common Era Subsequently, the features of BDA address the required functionalities. The capabilities of particular platform depend on certain important factors, such as data size, throughput and model development. These can be located in “Data at Rest” (batch) or “Data in Motion” (real-time). In this paper, we briefly survey and discuss the design of BDA architectures that might help adopting various open source technologies. The remainder of this paper is structure as follows. Section II introduces the background of Big Data. Section III describes the big compute features. Section IV surveys the design of architecture for BDA. Section V discusses the trends and analysis of BDA. Finally, Section VI concludes the paper. II. BACKGROUND OF BIG DATA A. Evolution of Data Analytics The ancient Paleolithic (‘Old Stone Age’) people mark notches into sticks/bones to keep track of trading activity of supplies. Subsequently, it uses to carry out simple calculations and food supplies predictions. In c. 2400 BCE1 , the first calculation device is constructed, known as abacus. Furthermore, libraries also appear around this time, it represents that the first mass data storage is built too. These have been coming into the use in Babylon. During 300 BC, the largest library is built in the ancient world, Library of Alexandria. This long history of revolution and innovation has led us to the dawn of the data age [3]. In 1880, US Census Bureau has faced a series problem as the population exploded, it turns into an administrative nightmare. The work of measuring and recording the population records are maddeningly slow and expensive. It estimates to take eight years to crunch all collected data. Hollerith [4] realizes the need for a better way to count results. Data is entered on a machine readable medium, punched cards, and tabulated by machine. It reduces the time required to process the census from eight years for the 1880 census to six years for the 1890 census. This revolutionized modern automated computation for handling incredible big data processing was founded by Hollerith, as father of the company, IBM. In 1970, the framework for a “relational database” introduces by IBM mathematician, Edgar [5]. This model data services, store and access information is still being popular and used today. Material Requirement Planning [6] system represent the first mainstream commercial computerized system uses to accelerating daily data processes. Subsequently, “Business Intelligence” becomes a popular emerging tool with database system for analyzing commercial and operational performance. Erik writes an article for Harper Magazine using of the term “Big Data”, “The keepers of big ,(((RQIHUHQFHRQ2SHQ6VWHPV,26

2. k,((( 34 Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.

3. data say they are doing it for the consumer’s benefit. But data have a way of being used for purposes other originally intended” [7]. Early of 1990s, the birth of the interconnected web of data and accessible to anyone from anywhere, known as Internet. The digital storage become more cost effective than manual printing documents. Michael [8] describes that including the sounds and images there are thousands of petabytes information, the existence of 12,000 petabytes is not an unreasonable guest. The web is increasing in size of 10-fold each year, however, data will never be discovered values and yield no insight. During the mid 1990s, the internet is extremely popular, but structure relational databases cannot cope with the variety of data types from different non- relational databases. Thus, NoSQL system is created to handle different languages and formats in a great flexible way. Larry Page and Sergey Brin implement Google’s search engine that can respond in a few seconds by returning desired results, which processing and analyzing Big Data in distributed method [9]. Richard comments that the purpose of computing is insight, and not just numbers. In 1999, Kevin introduces the term of “Internet of Things” to describe the growing number of devices online to automated the communication each other without a human interference; Also, it utilizes the Internet to empower computers to sense the world for themselves [10]. In the advent of Industry Revolution 4.0, which developing in Germany 2013; it has been rapidly spread in Europe and the world as a while. BDA is one of the key adoptions and pillar for IoT initiative to improve decision making [11]. It requires to process a large amount of data on the fly and storing the data in various scalable storage technologies. This lighting fast analytics implementation allows the industries to gain rapid insights, provide prediction for machinery, and share information. Intrinsically, it requires a unified architecture to cater common operation for enabling innovative applications. B. Big Data General ‘Vs’ Concept For understanding the Big Data concept, it always considers the simple building block of data model which is effectively communicating each and others. In 2001, Gartner analyst, Doug introduces the 3Vs concept in the dimension of data management, it consists of controlling data volume, variety and velocity [12]. It characterizes the creation of data, storage, retrieval and analysis. After a decade, IBM has been coined two more worthy of Vs, which are Veracity and Value. The following shows the brief description of 5Vs: Volume: It implies to the enormous quantity of data is generated. Velocity: It refers to the speed at the data is created and processed at staggering rate. Variety: It defines as type of content of data analysis. Veracity: It focuses on the quality and trust-worthiness of the variability in the captured data. Value: It raises to the significance of the data, which delivering the insights and creating useful model that answers sophisticated queries. Inspired by the comprehensive discussion and relevant comments on IBM website of Big Data Analytics hub, it clusters the 5Vs into three groups [13]: Volume Velocity: These translate into requirements of hardware and software to deal with data. Large scale distributed data processing framework is required such as Hadoop. Veracity Velocity: These translate into urgency of real-time processing. The detection of possible data corruption or manipulation is crucial with high speed processing ability. Value: This translates into the necessity of interdisciplinary cooperation. This raise the most difficult challenge for industrial use of big data. C. “Data at Rest” vs “Data in Motion” There is no small task in gaining the insights of big data. Firstly, “Data at Rest” refers the collected historical data from various sources. It performs the analytics after the event occurs. Thus, it is commonly used to discover behaviors and patterns from the past records. Also, it refers to “batch processing” method. To automate these tasks, there is a scheduler application in place for executing the tasks automatically. Secondly, “Data in Motion” refers to processing and analyzing data in real-time as the event happens. The latency is a key consideration, as a lag of processing can be resulted the loss of opportunities. Furthermore, hybrid of “Data at Rest” and “Data in Motion” are common in the industries. III. BIG COMPUTE FEATURES For data intensive computing [14], the system should encapsulate the sophisticated design technologies in storing, managing and processing big data. There are two focus of key areas, which are application and frameworks. These consists the concept of data parallelism and task/application parallelism. Data parallelism is distributed among servers, and therefore can be processed in parallel. It has been claimed that it opposes to task parallelism, furthermore, it is often the simpler method to craft a parallel application [15]. The followings describe the generic features for Big Compute: • Being efficient in pre-processing raw data and combining relevant data from multiple sources, commonly known as ETL (Extract, Transform and Load) • Being flexible to apply various aggregation functions and perform ad-hoc queries to compute large amount of sources in discovering the high-level insights of data • Being cost effective to extends functionalities with minimum costs and minimize maintenance cost for keeping the system running smoothly • Being low latency in harnessing real-time data for analytics by optimizing the high volume operation with minimal delay • Being highly scalable to enlarge the growth of the compute resources and storages with support easily plug-in • Being robustness and fault tolerance to have ability to cope with erroneous input and without down any failures ,(((RQIHUHQFHRQ2SHQ6VWHPV,26

4. 35 Authorized licensed use limited to: University of Waterloo. Downloaded on May 30,2020 at 04:01:40 UTC from IEEE Xplore. Restrictions apply.

5. • Being systematically governance to ensure data availability, usability, integrity and security in used Identifying the required features for a specific domain can be difficult. In general, different application domains might need different type of system. It is hard to meet all stockholder needs with a singular design. As such, Cigdem [16] attempts to use feature modelling technique [17]. It performs drill down by distinguish domain scoping, which determining the domain interest, the stockholders and their goals; and domain modelling, which aiming to derive using a commodity analysis. Figure 1 shows the feature model diagram. This work provides insight in the overall feature space of BDA system. It further assists for deriving the BDA architecture. Figure 1: Feature Model IV. REVIEW OF BIG DATA ARCHITECTURE FRAMEWORK A reference architecture helps to build a blueprint of the ultimate BDA system. It is based on a collection of characteristics and features from common for a given set of problems. The design of the architecture has to emerge the fluent orchestration workflow to execute either in a synchronous or asynchronous manner between the application and its data. In many cases, it includes the support for the hybrid mode of batch and real-time processing. The following reviews of architecture frameworks broaden the perspective and enabling problem solving with the right tools. A. Lambda ‘λ’ Architecture In 2011, one of the popular reference BDA architecture design has been posted by Marz [18]. It is named as “Lambda λ Architecture”. It is designed to combine of batch and real- time processing paradigm in a parallel form. This method is capable to solve many BDA use cases. In addition, it has the robustness with fault tolerant strategy for serving wide range of workloads. Technically, it is now feasible to run ad-hoc queries against Big Data, but querying a petabyte dataset every time you want to compute. Figure 2 shows the λ architecture with three major layers. Figure 2: λ Architecture The batch layer pre-computes the master dataset, and processes into batch views so that queries can be resolved with low latency. This requires striking a balance of job between pre-computation and execution time to complete the query. By doing a little bit of computation on the fly to complete queries, there save the process from needing to pre-compute large batch views. In addition, it is not expected to update the views frequently. The batch views may be a set of flat files and it depends on chosen technologies. The key is to precompute just enough information so that the query can be completed quickly. The serving layer indexes the views and provides interfaces, thus, the pre-computed data can be speedily queried. Both of the batch and speed layers are executed the same processing logic, and then reconciles the results in serving layer. It designates to be distributed among many servers for scalability. There is a long-standing problem where data is too normalized, there is a need to store some information redundantly to improve response times. However, denormalized the data may create huge complexity of keeping it consistent. Thus, it need to be carefully construct this view [19]. The speed layer is similar to batch layers. The objective is to construct views that can be efficiently queried. It mainly uses an incremental approach and handling real-time views. These views are updated directly when new data arrives. It compensates for the high latency of the batch layer to enable up-to-date results for queries. However, incremental computation has various new challenges and significant more complex than batch computation. Especially, resource- efficient manner with millisecond-level of latencies. Data must be indexed in order to using of random-read/random- write databases. ,(((RQIHUHQFHRQ2SHQ6VWHPV,26

7. B. Kappa ‘κ’ Architecture Jay [20] describes that the alternatives are worth exploring a part of λ architecture. He addresses the issue of maintaining the codes in two complex distributed systems. There is exactly painful development, as in operational burden. Especially, the distributed components like Storm and Hadoop. κ architecture has been introduced. In this approach, re-processing will execute, whereby the processing code has changed. Therefore, there is actually need to recompute the result sets. The job doing the re-computation is just an improved version of the same code, running on the same framework, taking the same input data. Basically, it is simplification of the λ architecture, where there have simply removed the entire Batch Layer. Hence, it remains Speed layer and Serve layer. Figure 3 shows the diagram of κ Architecture. The workflow can handle real-time data processing and continuous data re-processing in a single stream computation model. Streaming job reads the data and process them. When re-processing is required, a second instance of the streaming job is executed that starts processing the data from the beginning of the retained data and redirects the output to a separate table. When the second job that was executed has caught up with the entire dataset, simply switch the application to read from the new data view, stop the first job, and delete the data view of the first job [21]. The entire multi streams can spin up multiple consumers in parallel consuming individual part of the data. Figure 3: κ Architecture Another pillar of κ architecture is the immutable data log. This is similar in concept to the immutable Master Dataset in Lambda architecture, but instead of using technologies such as Hadoop/HDFS, κ architecture's immutable data log is (usually) Kafka2 . It retains the full log of the data that it needs to re-process. Data in Kafka is persisted to disk and replicated for fault tolerance. Furthermore, growing of data in Kafka, it doesn’t make the system slower, as it supports cluster implementation by distributed across servers with over a petabyte of storage. C. Microservices Architecture Fully built and deployed BDA solutions often include many components of mix vendor software and open source software as well. It uses physical servers, virtual machines and docker containers. Nevertheless, application programming interface (API) is a common method for integrating the functions and also stitched together into working pipeline for each data source. A container is similar a very lightweight virtual machine, however, microservices 2 Apache Kafka is developed by LinkedIn and being contributed open source community, as in Apache Software Foundation. 3 Apache Druid detail refers to “https://druid.apache.org/” are even lighter. Based on the trends in BDA, most analytics pipelines are easily deployed as an immutable microservices. These microservices executes on its own process/container and communicate in a self-regulate way without having to depend on other services or application as a whole. Microservices is commonly adopted Spark, Cassandra and Kafka open source technology [22]. Figure 4 shows the generic Microservices Architecture diagram as referring in [23]. It can build on demand as needed in batch, speed and serve layer. Figure 4: Microservices Architecture D. IOT Architecture With the raise of Industry Revolution 4.0, the combination of IOT and BDA with Artificial Intelligence are being driving to optimize and automate production for industry. IOT is in data-driven paradigm that uses real-time pervasive connected sensors, simulations and event logs to deliver analytics intelligent manufacturing through Internet/Intranet for every area of the factory [24]. These IOT devices have been deployed in daily operations to deliver operation efficiencies, process innovation and environmental benefits. It also presents the challenges in term of large-scale data management, processing and analysis [25]. It consists of four major bases; Time Series Store/Database (TSDB), Streaming Message Queue (SMQ), Workflow Orchestration Engine (WOE) and Distributed File System (DFS). Time Series Store/Database (TSDB): It is an optimized data management system for time-stamped or time-series data. For processing the query of time series data, the time series segment needs to be located. Then, there is a process of retrieval based on a combination of one or more values of the metadata, which commonly store in a relational database, such as SQLite, PostgreSQL, MySQL or others. This mechanism enables TSDB to have the low latency access for tracking, monitoring, down sampling, and aggregating over time. Typically, it has auto-shading and horizontal scaling with a store-specific API or through a specific build connector. There various open source TSDB, such as Apache Druid 3 , InfluxDB4 , OpenTSDB5 and others. Streaming Message Queue (SMQ): Machine-to-machine uses message protocol for establishing communication with publish-subscribe-based messaging to the servers; such as MQTT (Message Queue Telemetry Transport), XMPP (Extensible Messaging and Presence Protocol), DDS (Data Distribution Service and others. It handles certain filters, extraction and simple/complex calculation for process during the streaming processes. Workflow Orchestration Engine (WOE): It designs to orchestrate enterprise level data processing operation, flow- based controller, scheduler, data provenance with secure and 4 InfluxDB detail refers to “https://www.influxdata.com/” 5 OpenTSDB detail refers to “http://opentsdb.net/” ,(((RQIHUHQFHRQ2SHQ6VWHPV,26

9. durable for IOT and data analytics tasks. Furthermore, the orchestration framework supports in distributed cluster and extensibility with plug-in. Also, it has a diagrammatic of views and modifiable behavior from web browser. There are two popular open source orchestration workflow systems; Apache Nifi/MiniFi6 is written in Java and Node-Red7 is in JavaScript on top of Node.js platform. Figure 5: IOT Architecture Distributed File System (DFS): It is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to application. It has highly fault-tolerant capability, which has file system replicates, or copies, each piece of data multiple times and distributes the copies to individual nodes, placing at least one copy on a different server rack than the others. As a result, the data on nodes that crash can be found elsewhere within a cluster. This ensures that processing can continue while data is recovered. There choices of technologies for DFS, which is depending of the “brotherhood” of applications, as the most famous open source big data eco-system is Apache Hadoop [26]. Also, there are Ceph8 , Alluxio9 , OpenIO10 and others. E. NIST Big Data Reference Architecture (NBD-RA) The National Institute of Standards and Technology (NIST) has taken United State Federal Government for the Big Data Research and Development Initiative responsibility. It develops open standards and BDA architecture to accelerate the adoption of the most secure and effective Big Data techniques with technologies. White House announces this initiative on March 28, 2012 [27]. It starts with fix federal departments and agencies, which more than 80 projects involve in this development. NBD-RA is an elastic BDA architecture design. The conceptual model design can be vendor-neutral, technology- neutral, and infrastructure agnostic. The system consists of five logical functional components; System Orchestrator, Data Provider, Big Data Application Provider, Big data Framework Provider and Data Consumer. Then, there are two “Management” dimension and “Security Privacy”, which overlaying those five components. Also, these two dimensions provide services and functionality for BDA specific tasks. Figure 6 shows the NBD-RA architecture, which is referencing in [28]. 6 Nifi/MiNifi detail refers to “https://nifi.apache.org” 7 Node-RED detail refers to “https://nodered.org” 8 Ceph detail refers to “https://ceph.io”. 9 Alluxio detail refers to “https://github.com/Alluxio/alluxio”. Figure 6: NBD-RA Architecture V. TRENDS AND ANALYSIS The discussed architectures provide a structure with filling a set of generic tools. However, the choice of technologies to be used and integrated, which has much complexity. Firstly, the consideration of BDA system is either on-premise, cloud or hybrid. Secondly, the choice of data processing, analytics, security with governance application technologies to be developed; open source, commercial and hybrid. Finally, the return of investment (ROI) by having the big data system, it is driven by valuable AI use cases such as descriptive, predictive and prescriptive analytics. With on-premise BDA system, it provides high bandwidth of transfer rate with more flexibility for accessing the system. Nevertheless, it requires big capital outlay of investment with high maintenance cost. Alternatively, big data in cloud computing or hybrid cloud may be an alternative approach for offering high availability that ranging from 99.9% to 99.99999%. Also, the promising support of expandability of storage from gigabytes to petabytes [29]. However, there are some native Hadoop options available in public clouds like AWS, Google, Oracle, AliCloud and others. There may not be the best suit for certain solutions for many applications, due to the virtualization Hadoop performs slower workload for the intensive application [30] [31]. Generally, all these consideration needs a comprehensive requirements analysis and budgeting cost. Hadoop is one of BDA eco system, but it is not the only the choice. Elasticsearch is the alternative BDA solution, named ‘Elastic’. It is specialized for web search, network traffics and log analysis. It based on Apache Lucene for low- level indexing and analysis [32] [33]. NoSQL document- oriented data stores is popular and on-demand nowadays, MongoDB is one of widely used to provide durability with it write-ahead logging techniques [34] [35]. Apache Cassandra is one of the popular wide column-oriented enables continuous availability, tremendous scale and data distribution across multiple data centers and cloud availability zones [36]. It has been deployed at certain technology giants, such as Facebook, Netflix, Twitter, eBay and others. Nevertheless, there are variety of choices for cloud computing 10 OpenIO detail refers to “https://www.openio.io”. ,(((RQIHUHQFHRQ2SHQ6VWHPV,26

11. technologies; Google BigTable, Amazon S3 Object storage, Azure Cosmos DB, AlibabaCloud, ApsaraDB and others. AI Analytics is important to every aspect of the organization because it can help ROI at every level. Those implemented analytics use cases need to be built around the issues that are really clear, and the problems that businesses are having today, to improve efficiency, effectiveness, and specific issues such as customer satisfaction [37]. PWC reports that 59% of executives say big data at their company would be improved through the uses of AI [38]. By developing best practices for quick ROI and momentum of scale, it is critical for developing AI models, reusable building blocks of data sets and working across organizational boundaries to drive more valuable AI use cases [39]. VI. CONCLUSION Nowadays, data is the fuel of an organization’s vehicle to drive the business transformation. We are also witnessing the growth and important of the hidden value of data. Therefore, this paper contributes to various important aspect for exploring BDA concepts with “V”s, features model, and key component architectural components with trade-offs. BDA is now being one of the main pillars of industry revolution 4.0, as data analytics with AI are playing the crucial algorithmic roles in producing accurate results. REFERENCES [1] H. Asaadi, D. Khaldi, and B. Chapman, “A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments,” in 2016 IEEE International Conference on Cluster Computing (CLUSTER), 2016, pp. 423–432. [2] J. Yang, “From Google File System to Omega: A Decade of Advancement in Big Data Management at Google,” in 2015 IEEE First International Conference on Big Data Computing Service and Applications, 2015, pp. 249–255. [3] B. Marr, Big Data in Practice. John Wiley Sons, Inc., 2016. [4] F. W. Kistermann, “The Invention and Development of the Hollerith Punched Card: In Commemoration of the 130th Anniversary of the Birth of Herman Hollerith and for the 100th Anniversary of Large Scale Data Processing,” Ann. Hist. Comput., vol. 13, no. 3, pp. 245– 259, Jul. 1991. [5] E. F. Codd, “A Relational Model of Data for Large Shared Data Banks,” Commun. ACM, vol. 13, no. 6, pp. 377–387, Jun. 1970. [6] J. Peeters, “Early MRP Systems at Royal Philips Electronics in the 1960s and 1970s,” IEEE Ann. Hist. Comput., vol. 31, no. 2, pp. 56– 69, Apr. 2009. [7] R. Brueckner, “Where Did Big Data Come From?,” insidebigdata, 2013. [Online]. Available: https://insidebigdata.com/2013/02/03/where-did-big-data-come- from/. [Accessed: 12-Aug-2019]. [8] M. Lesk, “How Much Information Is There In the World?”,” 1997. [Online]. Available: http://www.lesk.com/mlesk/ksg97/ksg.html. [Accessed: 12-Aug-2019]. [9] B. Stone, “The Education of Google’s Larry Page,” Bloomberg Businessweek, Apr-2012. [10] K. Ashton, “That Internet of Things,” RFID J., 2009. [11] A. Petrillo, “Fourth Industrial Revolution: Current Practices, Challenges, and Opportunities,” in Digital Transformation in Smart Manufacturing, R. Cioffi and F. De Felice, Eds. Intechopen, 2018. [12] D. Laney, “3D Data Management: Controlling Data Volume, Velocity and Variety,” 2001. [13] S. Yin and O. Kaynak, “Big Data for Modern Industry: Challenges and Trends [Point of View],” Proc. IEEE, vol. 103, no. 2, pp. 143– 146, Feb. 2015. [14] S. Jha, J. Qiu, A. Luckow, P. K. Mantha, and G. C. Fox, “A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures,” CoRR, vol. abs/1403.1, 2014. [15] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” ACM SIGOPS Oper. …, pp. 59–72, 2007. [16] C. A. Salma, B. Tekinerdogan, and I. N. Athanasiadis, “Feature Driven Survey of Big Data Systems,” in Proceedings of the International Conference on Internet of Things and Big Data, 2016, pp. 348–355. [17] K. C. Kang, S. G. Cohen, J. A. Hess, W. E. Novak, and A. S. Peterso, “Feature-Oriented Domain Analysis (FODA) Feasibility Study,” Pittsburgh, 1990. [18] N. Marz, “How to beat the CAP theorem,” 2011. [Online]. Available: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html. [Accessed: 13-Aug-2019]. [19] N. Marz and J. Warren, Big Data: Principles and best practices of scalable realtime data systems. Manning Publications, 2015. [20] J. Kreps, “Questioning the Lambda Architecture,” O’Reilly Media, 2014. [Online]. Available: https://www.oreilly.com/ideas/questioning-the-lambda-architecture. [Accessed: 13-Aug-2019]. [21] A. Kumar, Architecting Data-Intensive Applications. Packt Publishing, 2018. [22] G. Vetticaden, “Building Secure and Governed Microservices with Kafka Streams,” Cloudera, 2018. [Online]. Available: https://blog.cloudera.com/building-secure-and-governed- microservices-with-kafka-streams/. [Accessed: 12-Aug-2019]. [23] J. Garrett, Data Analytics for IT Networks: Developing Innovative Use Cases. Cisco Press, 2018. [24] J. Davis, T. Edgar, J. Porter, J. Bernaden, and M. Sarli, “Smart manufacturing, manufacturing intelligence and demand-dynamic performance,” Comput. Chem. Eng., vol. 47, pp. 145–156, 2012. [25] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep Learning for IoT Big Data and Streaming Analytics: A Survey,” IEEE Commun. Surv. Tutorials, vol. 20, no. 4, pp. 2923–2960, 2018. [26] Z. Li and H. Shen, “Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform,” IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 11, pp. 3201–3214, Nov. 2017. [27] T. Kalil, “The White House Office of Science and Technology Policy: Big Data is a Big Deal,” Office of Science and Technology Policy (OSTP) Blog, 2012. [Online]. Available: https://obamawhitehouse.archives.gov/blog/2012/03/29/big-data-big- deal. [Accessed: 27-Aug-2019]. [28] “NIST Big Data Interoperability Framework: volume 8, reference architecture interfaces,” Gaithersburg, MD, Jun. 2018. [29] A. Zarrabi, E. K. Karuppiah, C. H. Ngo, K. K. Yong, and S. See, “Gravitational Search Algorithm using CUDA,” in IEEE Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014, 2014, pp. 193–198. [30] D. Nuñez, I. Agudo, and J. Lopez, “Delegated Access for Hadoop Clusters in the Cloud,” in 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, 2014, pp. 374–379. [31] M. E. Wendt, “Cloud-based Hadoop Deployments: Benefits and Considerations,” 2014. [32] J. Rosenberg, J. B. Coronel, J. Meiring, S. Gray, and T. Brown, “Leveraging Elasticsearch to Improve Data Discoverability in Science Gateways,” in Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), 2019, pp. 19:1--19:5. [33] B. Dageville et al., “The Snowflake Elastic Data Warehouse,” in Proceedings of the 2016 International Conference on Management of Data, 2016, pp. 215–226. [34] R. R. Shetty, A. M. Dissanayaka, S. Mengel, L. Gittner, R. Vadapalli, and H. Khan, “Secure NoSQL Based Medical Data Processing and Retrieval: The Exposome Project,” in Companion Proceedings of the10th International Conference on Utility and Cloud Computing, 2017, pp. 99–105. [35] B. Sendir, M. Govindaraju, R. Odaira, and P. Hofstee, “Low Latency and High Throughput Write-Ahead Logging Using CAPI-Flash,” IEEE Trans. Cloud Comput., p. 1, 2019. [36] A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,” SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40, Apr. 2010. [37] S. Earley, “Executive Roundtable Series: Driving Higher ROI and Organizational Change,” IT Prof., vol. 17, no. 6, pp. 60–64, Nov. 2015. [38] “2018 AI predictions: 8 insights to shape business strategy,” PwC AS, 2018. [39] “2019 AI Predictions: Six AI priorities you can’t afford to ignore,” PwC AS, 2019. ,(((RQIHUHQFHRQ2SHQ6VWHPV,26

Review of big data analytics (bda) architecture trends and analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Review of big data analytics (bda) architecture trends and analysis

Similar to Review of big data analytics (bda) architecture trends and analysis (20)

More from Conference Papers

More from Conference Papers (20)

Recently uploaded

Recently uploaded (20)

Review of big data analytics (bda) architecture trends and analysis