The document discusses the six pillars for building big data analytics ecosystems: storage, processing, analytics, user interfaces, deployment, and future directions. It provides an overview of approaches for each pillar, popular systems, challenges, and how the pillars form a taxonomy to guide organizations in building their ecosystems. Key components discussed include HDFS, MapReduce, YARN, visualizations, product vs service deployment models, and ensuring the components work efficiently together.
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...Lightbend
Is the Actor Model just a new "shiny object" for developers to chase after, a fad soon to be abandoned? In fact, the Actor Model was first designed in 1973–over 20 years before brands like Yahoo! and Hotmail first arrived at the burgeoning internet. Created to address the long-term direction of computing and software development, it is almost as old as the formal definition of object-oriented programming.
Fast forward to 2017, where we are faced with an online and mobile world that continues to grow exponentially, and a third wave of IoT aims to add hundreds of billions of connected devices to our lives.
To manage today’s demanding needs and to prepare for the coming wave, enterprises like Intel, Samsung, Walmart, PayPal, Hootsuite, and Norwegian Cruise Lines are embracing distributed, Reactive systems deployed on hybrid cloud infrastructures. Central to these systems and applications is the Actor Model, which is seeing “renewed interest as cloud concurrency challenges grow,” according to Forrester Research.
In this webinar, special guest Vaughn Vernon explains why actors are so vital to the future of software development. You will learn:
- Why actors are so exceptionally well-suited for use with Domain-Driven Design, speaking the Ubiquitous Language of your core business domain.
- How actors are designed to gracefully handle failure, maintaining system resilience and responsiveness to users no matter what’s happening.
- How actors help you reactively scale your systems meet concurrency demands, elastically growing up and out to handle peak load, and shrinking when not minimizing your infrastructure and hardware footprint.
HCI is the study, planning, design of the interaction between humans and computers. A human’s interaction with the outside world occurs through information being received and sent: input and output. In an interaction with a computer the user
receives information that is output by the computer, and responds by providing input to the computer.
This presentation is about the introduction, history and inner supporting managing system of Operating System.
how Process Scheduling and file management works by Windows.
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...Lightbend
Is the Actor Model just a new "shiny object" for developers to chase after, a fad soon to be abandoned? In fact, the Actor Model was first designed in 1973–over 20 years before brands like Yahoo! and Hotmail first arrived at the burgeoning internet. Created to address the long-term direction of computing and software development, it is almost as old as the formal definition of object-oriented programming.
Fast forward to 2017, where we are faced with an online and mobile world that continues to grow exponentially, and a third wave of IoT aims to add hundreds of billions of connected devices to our lives.
To manage today’s demanding needs and to prepare for the coming wave, enterprises like Intel, Samsung, Walmart, PayPal, Hootsuite, and Norwegian Cruise Lines are embracing distributed, Reactive systems deployed on hybrid cloud infrastructures. Central to these systems and applications is the Actor Model, which is seeing “renewed interest as cloud concurrency challenges grow,” according to Forrester Research.
In this webinar, special guest Vaughn Vernon explains why actors are so vital to the future of software development. You will learn:
- Why actors are so exceptionally well-suited for use with Domain-Driven Design, speaking the Ubiquitous Language of your core business domain.
- How actors are designed to gracefully handle failure, maintaining system resilience and responsiveness to users no matter what’s happening.
- How actors help you reactively scale your systems meet concurrency demands, elastically growing up and out to handle peak load, and shrinking when not minimizing your infrastructure and hardware footprint.
HCI is the study, planning, design of the interaction between humans and computers. A human’s interaction with the outside world occurs through information being received and sent: input and output. In an interaction with a computer the user
receives information that is output by the computer, and responds by providing input to the computer.
This presentation is about the introduction, history and inner supporting managing system of Operating System.
how Process Scheduling and file management works by Windows.
Introduction au Domain Driven Design, un livre rempli de bonnes idées pour concevoir avec agilité ses applications (conception et modélisation émergente, adéquation de la solution au problème adressé, stratégie du découpage en module et des interactions inter-équipes, alignement entre le métier et les développeurs ...)
OpsGenie is an incident management tool but as part of its rich feature set, it has powerful integrations that allow you to implement the idea of auto-remediation. Auto remediation is an approach to automation that responds to events with automations able to fix or remediate, underlying conditions. In this session, Jaclyn Mazzarella, VP of Marketing, will show how you can take incident management to the next level by leveraging out of the box OpsGenie features.
Operating system is an integrated set of program that controls the resources of a computer system and provides its users with an interface or virtual machine that is easier to use than the bare machine
Technical practices like refactoring and TDD (Test-Driven Development) have become mainstream in software development. However, software developers I met in many companies are either oblivious or have a different interpretation. My interest is to help developers adopt technical practices and being a mentor has played a big part. Through the years I've tried many ways to maximise the effectiveness of mentee's learning and also brings many challenges and discoveries. In this talk, I'll share the experiments I tried and hope it'll inspire you to help others improve their technical practices.
Big Data Analytics for Commercial aviation and AerospaceSeda Eskiler
globalaviationaerospace.com
An opportunity for insight in the changing commercial aerospace business
Vision for New Applications of Analytic Insight in Commercial Aerospace
Benefit of Big Data Analytics for the Airline Operator
Modern, Mobile Experience
Big Data Analytics In Action
Predictive Analytics To Prevent Engine Events
Predictive Analytics Improves Safety and Quality
Predictive Analytics Keeps More Planes in the Air
Introduction au Domain Driven Design, un livre rempli de bonnes idées pour concevoir avec agilité ses applications (conception et modélisation émergente, adéquation de la solution au problème adressé, stratégie du découpage en module et des interactions inter-équipes, alignement entre le métier et les développeurs ...)
OpsGenie is an incident management tool but as part of its rich feature set, it has powerful integrations that allow you to implement the idea of auto-remediation. Auto remediation is an approach to automation that responds to events with automations able to fix or remediate, underlying conditions. In this session, Jaclyn Mazzarella, VP of Marketing, will show how you can take incident management to the next level by leveraging out of the box OpsGenie features.
Operating system is an integrated set of program that controls the resources of a computer system and provides its users with an interface or virtual machine that is easier to use than the bare machine
Technical practices like refactoring and TDD (Test-Driven Development) have become mainstream in software development. However, software developers I met in many companies are either oblivious or have a different interpretation. My interest is to help developers adopt technical practices and being a mentor has played a big part. Through the years I've tried many ways to maximise the effectiveness of mentee's learning and also brings many challenges and discoveries. In this talk, I'll share the experiments I tried and hope it'll inspire you to help others improve their technical practices.
Big Data Analytics for Commercial aviation and AerospaceSeda Eskiler
globalaviationaerospace.com
An opportunity for insight in the changing commercial aerospace business
Vision for New Applications of Analytic Insight in Commercial Aerospace
Benefit of Big Data Analytics for the Airline Operator
Modern, Mobile Experience
Big Data Analytics In Action
Predictive Analytics To Prevent Engine Events
Predictive Analytics Improves Safety and Quality
Predictive Analytics Keeps More Planes in the Air
Sono molteplici le sfide che una banca deve essere in grado di fronteggiare per riuscire a valorizzare al meglio i propri dati generando nuove opportunità di Business. In quest’ottica i Big Data e l’Analitycs svolgono un ruolo chiave nella definizione di un’azienda “data driver”. Il proseguimento di questi obiettivi passa attraverso l’evoluzione architettura, l’introduzione di nuove metodologie e l’inserimento di nuove competenze. L’obiettivo di questo intervento è quello di presentare come queste tematiche si calano all’interno del contesto BNL.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
For discovery-phase research, life sciences companies have to support infrastructure that processes millions to billions of transactions. The advent of a data lake to accomplish such a task is showing itself to be a stable and productive data platform pattern to meet the goal. We discuss how to build a data lake on AWS, using services and techniques such as AWS CloudFormation, Amazon EC2, Amazon S3, IAM, and AWS Lambda. We also review a reference architecture from Amgen that uses a data lake to aid in their Life Science Research.
Embedded subscriber database analytics help operators improve internal efficiency and monetize data assets, while exploring new cross-vertical Internet of Things (IoT) applications.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
Big data and the cloud are perfect partners for companies who want to unlock maximum value from all of their unstructured, semi-structured, and structured data. The challenge has been how to create and manage a reliable end-to-end solution that spans data ingestion, storage and analysis in the face of the volume, velocity and variety of big data sources.
In this webinar, we will show you how to achieve big data bliss by combining StreamSets Data Collector, which specializes in creating and running complex any-to-any dataflows, with Microsoft's Azure Data Lake and Azure analytic solutions.
We will walk through an example of how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform and Azure Data Lake to take advantage of analytics tools with unprecedented scale and performance.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
Did you know Microsoft provides a Hadoop Platform-as-a-Service (PaaS)? It’s called Azure HDInsight and it deploys and provisions managed Apache Hadoop clusters in the cloud, providing a software framework designed to process, analyze, and report on big data with high reliability and availability. HDInsight uses the Hortonworks Data Platform (HDP) Hadoop distribution that includes many Hadoop components such as HBase, Spark, Storm, Pig, Hive, and Mahout. Join me in this presentation as I talk about what Hadoop is, why deploy to the cloud, and Microsoft’s solution.
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
What's the origin of Big Data? What are the real life usage scenarios where Hadoop has been successfully adopted? How do you get started within your organizations?
The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Developed by Google’s Artificial Intelligence division, the Sycamore quantum processor boasts 53 qubits1.
In 2019, it achieved a feat that would take a state-of-the-art supercomputer 10,000 years to accomplish: completing a specific task in just 200 seconds1
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
In recent years, Apache™ Hadoop® has emerged from humble beginnings to disrupt the traditional disciplines of information management. As with all technology innovation, hype is rampant, and data professionals are easily overwhelmed by diverse opinions and confusing messages.
Even seasoned practitioners sometimes miss the point, claiming for example that Hadoop replaces relational databases and is becoming the new data warehouse. It is easy to see where these claims originate since both Hadoop and Teradata® systems run in parallel, scale up to enormous data volumes and have shared-nothing architectures. At a conceptual level, it is easy to think they are interchangeable, but the differences overwhelm the similarities. This session will shed light on the differences and help architects, engineering executives, and data scientists identify when to deploy Hadoop and when it is best to use MPP relational database in a data warehouse, discovery platform, or other workload-specific applications.
Two of the most trusted experts in their fields, Steve Wooledge, VP of Product Marketing from Teradata and Jim Walker of Hortonworks will examine how big data technologies are being used today by practical big data practitioners.
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Big Data and Analytics
What ?
◦ Any voluminous amount of
◦ Structure
◦ semi-structured
◦ unstructured data
Where
◦ Large organizations
Why ?
◦ Cost reduction
◦ Faster, better decision making
◦ New products and services
Big data Analytics Ecosystems
◦ Data explorations
◦ Data preparation
◦ Modeling
4. Pillars of Big Data: Storage
RDBMS
◦ Ensures ACID
◦ Performance and scalability
DFS
◦ Client server architecture
◦ Hiding information from use e.g location
◦ Concurrency transparency
◦ Failure transparency
◦ Replication and Scalability transparency
◦ E.g GFS, HDFS,CFS
No SQL
◦ Sacrifice the consistency, to have high availability and scalability
◦ Data store as key/value pairs
◦ Supports three types
◦ E.g MVCC, COD, DOD,Graph
5. Pillars of Big Data: Processing
Batch Processing
◦ Execute a series of jobs without manual intervention
◦ E.g Hadoop
◦ Real life example
◦ Credit card
◦ Map Reduce
◦ Map
◦ Shuffle
◦ Reduce
Interactive Processing
◦ Requires human interaction
◦ Real life example
◦ Spreadsheets
6. Pillars of Big Data: Processing
Iterative Processing
◦ Machine learning operations
◦ Requires several passes for the algorithm to converge
◦ HaLoop, Main Memory MapReduce(M3R)
◦ Real life example
◦ Evaluation of mathematical expression
Incremental Processing
◦ Analyze data in motion
◦ Requires quick actions
◦ Full data is not required for algorithm
◦ E.g Apache Storm, Microsoft Trill
◦ Real life example
◦ Check on incoming data stream for security
7. Pillars of Big Data: Processing
Approximate Processing
◦ Quick retrieval of approximate results from a small sample
◦ E.g Early Accurate Result Library (EARL), Blink DB
In-Database Processing
◦ In database machine learning
◦ Microsoft SQL Server Analysis Services (SSAS)41
8. Pillars of Big Data: Analytics
Orchestration
Orchestrate complex analytic jobs and workflows to achieve the user’s goals
Scheduling
. Resource Utilization
Resources : Memory, CPU, Network and Disk
the idea of effective resource utilization to mitigate idle
resources
. Hadoop 1.0 Shortcomings
. Apache Hadoop YARN
. Data Locality
ensure data and processing on same node to avoid network
congestion
9. Pillars of Big Data: Analytics
Orchestration
Provisioning
. Resource Provisioning
Resource allocation to jobs with minimal cost & execution time
. Resource Set (RS) Maximizer
. Conductor
. Data Provisioning
10. Pillars of Big Data: Analytics
Assistance
narrowing analytics talent gap by magnifying internal skill set using in tool assistance
Static Assistance
. Tooltips
. Help Pages
. Wizards
Intelligent Assistance
. Data Preparation
determining and converting irrelevant data/attributes to meaningful info
11. Pillars of Big Data: Analytics
Assistance
. Selecting Operations
. Expert Systems (ES)
. Meta-Learning Systems (MLSs)
. Ontology Reasoners (OR)
. Automatic workflow generation
Provide a workflow based on the input data and existing problem
. Fault Detection and Handling
When the data is Big Data, failure in the middle is a catastrophe
12. Pillars of Big Data: User Interfaces
Full power of analytics solutions are limited to relevant users.
Five approaches for user interfaces:
Scripts
SQL-based Interfaces
Graph based interfaces
Sheets
Visualizations
13. Pillars of Big Data: User Interfaces
Scripts:
Analytics at programming level
Interface’s can be CLI or API
Low level coding
Supports data mining
Mostly avoided by a normal user
Such as: R for statisticians, Matlab and
weka
SQL-based Interfaces
Unified SQL interface – extended SQL
Use of UDF’s (User defined functions)
Further classification:
SQL-on-Hadoop
Machine learning SQL
14. Pillars of Big Data: User Interfaces
Graphs:
No need to code
Drag and drop
Panel (Operations) and canvas
(Processing)
Such as: Rapidminer, IBM SPSS modeler,
WINGS etc
Sheets:
Most fissile for business organization as it
deals with spreadsheets
Focused on data exploration in easiest
way
Compatible with moving data on another
solutions
Such as: Power query, Microsoft Tabular,
Google open refine etc
15. Pillars of Big Data: User Interfaces
Visualization
To control the high probability of analyzing the wrong or incompatible set of attributes
Suitable for large business firms
Lack of machine learning techniques
Such as: IBM Watson analytics, SAS visual Analytics etc
16. Pillars of Big Data: Deployment
Many components that needs to be integrated together
Deployment challenges includes
◦ Complexity
◦ Challenging
◦ Scope beyond the in house IT technicians
17. Pillars of Big Data: Deployment
Product:
Use of product deployment models to ensure privacy and security
◦Cost
◦IT-Staff
◦Limited Scalability
Most components are open source platforms but again integration is the major issue
18. Pillars of Big Data: Deployment
Service:
Services provided on demand, solution cost pay per user/data.
Security and privacy is an issue and cost of moving data to provider’s cloud.
Hybrid cloud
Data storage and processing residing on the organization infrastructure
19. Future Directions:
Each solution brings some features not available in the others, but also
adds some limitations and overheads.
While there has been a continuous improvement in analytics solutions to
address different analytics scenarios, there are still some gaps.
20. Conclusions:
Difficult to select suitable analytics solution because a weak component in
the ecosystem can cause the whole ecosystem to function inefficiently.
For each of these pillars, different approaches are discussed and popular
systems are presented.
The pillars form a taxonomy that aims to give an overview on the field, to
guide organizations and researchers to build their Big Data Analytics
Ecosystem, and help to identify challenges and opportunities in the field.
Editor's Notes
What : Big data is an evolving term that describes any voluminous amount of structured, semi structured and unstructured data that has the potential to be mined for information. quickly. Increasingly, organizations’ success has become dependent on how quickly and efficiently they can turn the petabytes of data they collect into actionable information
Data can be structured, which is generated by applications like Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems and typically
stored in rows and columns with well-defined schemas. It can be semi-structured, which is generated by sensors, web feeds, event monitors, stock market feeds, and network
and security systems.
Where:With almost everything now online, organizations look at the Big Data collected to gain insights for improving their services.
Why: Cost reduction Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business
Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned.
New products and services. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs.
Data Exploration: Analysts go through the data, using ad-hoc queries and visualizations,
to better understand the data;
Data preparation: Analysts clean, prepare, and transform the data for modeling
using batch processing to run computational and IO intensive operations;
Data models are trained, using iterative processing, on the prepared
data and trained models are used to score the unlabeled data.
—Storage that handles the data’s huge volume, fast arrival, and multiple formats;
—Processing that meets the Big Data Analytics processing needs;
—Orchestration that manages available resources to reduce processing time and cost;
—Assistance that goes beyond the interface and provides suggestions to help users
with decisions when selecting operations and building their analytics process;
—User Interface that provides users with a familiar environment to build and run their
analytics;
—Deployment Method that provides scalability, security, and reliability.
ACID (Atomicity, Consistency, Isolation, and Durability)
Recent RDBMSs developpments promise enhanced performance and scalability
Hadoop File systems HDFS
Cassandra File System (CFS)
Voldemort and Riak use Multi Version Concurrency Control (MVCC)
Column-Oriented Datab
Document-Oriented Databasease
MapReduce, as presented in Figure 3, consists of Map, Shuffle, and Reduce phases, which are executed sequentially, utilizing all nodes in the cluster. In the Map phase, the programmer-provided Map function (Mapper) processes the input data and outputs intermediate data in the form of <key, value> tuples which get stored on disk. The Shuffle phase then groups values to the same key together and sends them to the reduce nodes over the network. Finally, the programmer-provided Reduce function Reducer) reads the intermediate data from disk, processes it, and generates the final output..
Batch processing