SlideShare a Scribd company logo
Big Data SE vs. SE of BD Systems
Process: Business Process Driven Service Development Lifecycle (BPD-SDL)
Methods and Design Principles: service components with soalML
Reference Architecture for big data (REF4BD)
Tools (SAS, Visual Paradigm, BonitaSoft, Bizagi Studio, Tabulea, Mathematica,
Azure/ML)
SOSE4BD as a Service (SOSEaaS), BDaaS, Big Data Adoption Framework as a Service (BAaaS),
Software Engineering Analytics as a Service(SEAaaS), SE Prediction Model as a Service (SEPaaS), Bug
Prediction as a Service with Azure/ML (MLaaS), BD Metrics as a Service (BDMaaS)
Adoption Models
Evaluation & Applications
Dr Muthu Ramachandran PhD FBCS Fellow of Advance HE, MIEEE, MACM
Principal Lecturer
Software Engineering Technologies and Emerging Practices (SETEP) Research
Group Lead
School of Computing, Creative Technologies, and Engineering (CCTE)
Leeds Beckett University
Email: M.Ramachandran@leedsbeckett.ac.uk
Research
Motivation
• In this changing era of integrated IoT, Cloud, Real-time Big Data Stream (Social
Media, Smart Cities, Smart Living, etc.) and services are to be Robust, Agile,
Accessible and Available to its clients. For secured and guaranteed delivery of
services, every big organization is shifting their service delivery model to
Enterprise Service Bus (ESB). The following research questions are posed:
• How do we achieve Data Reuse, Reliability, Resiliency (3Rs) and Security, Accuracy
and Availability (SAA)?
• How do we engineer a systematic approach to using and reusing software
repositories, 50 years of software engineering knowledge and experience data for
developing software and software as a service paradigm (Big Data Software
Engineering)?
• How do we engineer a systematic approach with 50 years of software engineering
approach to data science and to develop big data systems and services (Software
Engineering for Big Data)?
• What are the design principles for a SOA driven reference architecture?
• What are services comprise reference architecture for big data systems?
• How to classify technologies and products/services of big data systems?
• How should software development teams integrate Data Analytics (Data
Collection, Transformation, Analytics, and Improvement) into their software
development process?
• What new roles, artifacts, and activities of BD process come into play?
• How do we integrate BD process of new roles, artifacts, and activities tie into
existing agile or DevOps process?
BD Concepts
• Big data is characterized by 8V’s: volume (large amounts of data), velocity
(continuously processed data in real time), variety (unstructured, semi-
structured or structured data in different formats and from multiple and
diverse sources), veracity (uncertainty and trustworthiness of data), validity
(relevance of data to the problem to solve), volatility (constant change of
input data), and value (how data and its analysis adds value).
• Big data systems are software applications that process and potentially
generate big data. Such applications receive and process data from various
diverse (usually distributed) sources, such as sensors, devices, whole
networks, social networks, mobile devices or devices in an Internet-of-
Things. They process high workloads of data and handle high requests for
data. The idea is to use large amounts of data strategically and efficiently to
provide additional intelligence.
8Vs of big
data
http://www.visualcapitalist.com/internet-minute-2018/
Velocity: BD requires real-time processing at varying intervals and may include stream as well as
batch processing
Volume: BD provides a massive historical data over several time periods (years, months, weeks, days,
etc)
Variety: The BD captured may be in variety of formats (multiple file files and multi-modal data) and
may be structured and unstructured.
Veracity: The BD captured may contain unwanted data which requires extraction, transformation, and
cleaning
Value: BD may contain very highly valuable data as well as not useful and it requires skilled data
scientist to identify what to consider for analytical processing what to discard.
Characteristics of BD Systems Requirements
BD
Systems
8 Vs of basic
BD systems
characteristics
Data Security (Data
Security with Data Lifecycle
(Stored, Used, Transferred,
Live, & Real-time Streaming
Data), Integrate data
security attributes
(confidentiality, integrity
and availability)
Flexible DB Architecture
Framework (requiring
new No-SQL not only
SQL-based relational db
systems) and to be able
to link multiple formats.
BD Technologies include
SQL Based (Hadoop &
SPark), and No-SQL Based
(Hbase, BigTable, Casandra,
MongoDB)
Data Cleaning
Processing (ETL
process – Extract,
Clean, Transform,
& Load)
BD Analytics
for Decision
Making &
Business
Intelligence
Multiple data
formats
Multi-channel
data source
(Real-time
stream and
batch
processing)
Spark is a cluster-computing framework, which means that it
competes more with MapReduce than with the entire Hadoop
ecosystem. For example, Spark doesn't have its own distributed
filesystem, but can use HDFS. Spark SQL which integrates relational
processing with the functional programming API of Spark. Querying
data through SQL or the Hive query language is possible through
Spark SQL. Spark SQL is a Spark module for structured data
processing. It provides a programming abstraction called
DataFrames and can also act as a distributed SQL query engine.
MongoDB is a NoSQL database, whereas Hadoop is a framework for
storing & processing Big Data in a distributed environment.
MongoDB is a document oriented NoSQL database. MongoDB
stores data in flexible JSON like document format. You can easily
map the documents to your applications.
Apache Cassandra is a NoSQL database ideal for high-speed, online
transactional data, while Hadoop is a big data analytics system that
focuses on data warehousing and data lake use cases. MapReduce is
a programming paradigm for processing and handling large data
sets.
Apache HBase is a column-oriented, NoSQL database built on top of
Hadoop (HDFS, to be exact). It is an open source implementation of
Google's Bigtable.
List of ETL Tools, https://www.etltool.com/list-of-etl-tools/
BD Analytics is the application of analysis, data, and systematic
reasoning to make decisions, learn patterns of occurring to extract
knowledge for improving process and business efficiency and cost.
Analytics allows for summarising, filtering, modelling, and
experimenting. Tools and techniques include A/B testing, statistical
modelling, machine learning, deep learning, natural language
processing, and muti-linear subspace learning.
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity
hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs. The core of Apache Hadoop consists of a storage part, known as Hadoop
Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files
into large blocks and distributes them across nodes in a cluster.
Several research challenges of software engineering for
developing big data systems
• Surveying the existing software engineering literature on applying software
engineering principles into developing and supporting big data systems
• Identifying the fields of application for big data software systems
• Investigating the software engineering knowledge areas that have seen
research related to big data systems
• Revealing the gaps in the knowledge areas that require more focus for big
data systems development
• Determining the open research challenges in each software engineering
knowledge area that need to be met
• To be sustainable, we need an approach which is systematic, business-
driven (supporting business-process and value driven), and based on
established Software Engineering practices
50 Years of SE
• 60 software development methodologies
• 50 static analysis tools
• 40 software design methods
• 37 benchmark organizations
• 25 size metrics
• 20 kinds of project management tools
• 22 kinds of testing and dozens of other tool variations.
• Minimum of 3000 programming languages software consisted, even though only 100
were frequently used. New programming languages are announced every 2 weeks, and
new tools are out more than one in each month. Every 10 months new methodologies
are discovered.
• Newly emerged service computing paradigms (SOA, Web Services, Micro Services,
Cloud SE, etc.) and established SE abstractions (Objects, classes, components, service
components, virtualisation techniques, resource-oriented computing, and containers,
etc.)
Key Areas of Research Challenges
• Software Engineering for Big Data which can provide a systematic process
for improving the development of big data systems. The process includes
requirements gathering for BD, software architecture for BD, testing and
debugging BD systems (performance, reliability, and security) where the
logs of analysing 5V characteristics should be included, SE process for BD
which could include CMMI, and finally Managing BD projects.
• Big Data Software Engineering is an area of research which should focus
on utilising BD for the benefit of improving SE practices and to improve
software production. The typical activities should include analytics for
software engineering, data mining software repositories, visual analytics
for software engineering, and self-adaptive systems which utilises data
generated and self-learn.
Four Pillars of SE4BD Principles
•Data Security with Data
Lifecycle (Stored, Used,
Transferred, Live, & Real-time
Streaming Data), Integrate
data security attributes
(confidentiality, integrity and
availability) and apply Software
Security Engineering Principles
while Data Modelling for BD:
•Misuse and Abuse cases for
service requirements
•Threat Modelling
•Secure SDLC
•Service Composition
•Autonomic Computing
•Knowledge Discovery,
Extraction, & Patterns
for Reuse
•Reuse by Analytics
Patterns & Predictive
Analytics Patterns
•BDSE Metrics with Data
Points
•Data Processing Architecture for
Integrating IoT-Cloud-BD
•Support both stream & batch
processing data
•Design for data complexity
•SOSE4BD
•Agile for BDSE
•Requirements Engineering for BD
with BPMN Simulation
•Design for Data Service Reuse
•Design for Data Security
(Achieving BSI)
•SOSE4BD Framework: Design
methods based on soaML and
service components, Process,
Applications, and Evaluation
Service-
Oriented SE
for BD
Applications
Integrated
Services (IoT-
Cloud-BD)
Data Security
(Building
Security In
(BSI))
Reuse-
Oriented
Knowledge
Discovery &
Patterns
Integrated Services (IoT-Cloud-
BD)
Data Security
Reuse-Oriented Knowledge
Discovery & Patterns
Service-Oriented SE for BD
Applications
Software Engineering Framework for
Big Data Systems (SE4BD)
Framework: methods, process, reference architecture (REF4BD)
applications, adoption and evaluation
SE4BD Framework
Process: Business Process Driven Service Development Lifecycle (BPD-SDL)
Methods and Design Principles: service components with soalML
Reference Architecture for big data (REF4BD)
Tools (SAS, Visual Paradigm, BonitaSoft, Bizagi Studio, Tabulea, Mathematica, Azure/ML)
SOSE4BD as a Service (SOSEaaS), BDaaS, Big Data Adoption Framework as a Service
(BAaaS), Software Engineering Analytics as a Service(SEAaaS), SE Prediction Model as a
Service (SEPaaS), Bug Prediction as a Service with Azure/ML (MLaaS), BD Metrics as a
Service (BDMaaS)
Adoption Models
Evaluation & Applications
SE4BD is process-centric
approach which provides
process, Methods and design
principles based on service
components with soaML,
reference architecture, tools,
applications, adoption models,
and evaluation
Benefits of SE Approach to BD Systems
SE4BD
Improved productivity &
reuse (Reuse of 50 years of
Software development best
practices, knowledge and
experiences)
RE for BDS; Design Methods for BDS,
References Architecture for BDS; Cloud Based
Implementation and Automating BDS Services,
Test & Deployment of new BDS services
Learning from mistakes
(Repositories of Software
Engineering Data Menzies
and Zimmermann (2013)
Software Reuse to Data Reuse: Knowledge
extraction, Reusable Patterns, BD Adoption
framework, Validating Data with Simulation
Tools, SPI Approaches to Business Intelligence
& Improvement
Software Engineering of BD
Analytics with SOA
Visual Analytics (data on bugs,
vulnerabilities, software quality,
usability, reusability, faults 7
failures, process, project
management, etc.)
Software Predictive Modelling &
Analytics with Cloud Based
Machine Learning Tools
(Azure/ML)
Improved Business Intelligence
of BDS
Secure new BDS as a Service
Menzies, T and Zimmermann, T (2013) Software Analytics: So What?, IEEE Software, vol. 30, no. 4, 2013
Benefits of Big Data SE
BDSE
Improved productivity &
reuse (Reuse of 50 years of
Software development best
practices, knowledge and
experiences)
Learning from mistakes
(Repositories of Software
Engineering Data Menzies
and Zimmermann (2013)
Software Engineering
Analytics
Visual Analytics (data on bugs,
vulnerabilities, software quality,
usability, reusability, faults &
failures, process, project
management, etc.)
Software Predictive Modelling &
Analytics with Cloud Based
Machine Learning Tools
(Azure/ML)
Improved Product Line,
productivity, Quality, QoS,
Process & Reuse
Secure Software & Software as a
Service
Menzies, T and Zimmermann, T (2013) Software Analytics: So What?, IEEE Software, vol. 30, no. 4, 2013
Data Science for Software Engineering
SE Analytics &
Big Data SE
• Big data analytics, visualisation, and predictions
has been useful and very popular research and
applications in recent years. However, in the
context of application of the big data practices to
Software Engineering Analytics, there are some
key questions need to be addressed:
• How do we apply to software engineering
analytics?
• How do we collect and access SE experience
data?
• How do we apply them to decision making
for business as well as the SE practices:
process, methods, and technology?
• How do we apply them for Software Process
Improvement and Software Practices
Improvement (SPI)
• How do we apply them for software business
process improvement?
SE for BD Analytics
Organisation,
Best Practices
Data,
Business
Data,
Software
Usage
Experience
Data, Process
Experience
Data, SE
Practices
Data
Big Data
(Prevailing
Experience)
Analytics &
Visualisation
Predictions
Decision Making
& Smart
Marketing
Improvement
of Business &
Software
Process,
methods,
Practices,
Team Spirit,
Test
Practices,
Products,
Quality
SE4BD
Lifecycle SEF4CC
BD Adoption
Framework
Big Data Source RE
Big Data Transformation RE
Big Data Consumer RE
Big Data Capability RE
Big Data Security & Privacy RE
Big Data Visualisation and Analytics RE
Business Intelligence RE (Business
Improvements)
Use BPM modelling and simulation to
validate new data and business
improvement services
Actionable SE Analytics Tools with SE Data Repositories
• Code coverage and visualisation tools
• Project Management & Monitoring Tools
• There are other tools to support management such as PROM, Hackystat which are capable of monitoring and to
generate a statistical report of a software project. However, even after years of development, there is no major
upgrade that can help predict management decisions (Zimmermann, 2013). Most of these tools were focused mostly
more on collection rather than any critical analysis.
• Later, more tools were developed that realized the manger problems and created which focused data presentation
rather than just collecting data. Tools like, Microsoft’s Team Foundation Server and IBM’ which keeps developers
updated with modification, bugs and build results (Zimmermann, 2013). Other Project management tools such as
Automated Project Office(APO) (Jones, 2017) also helps managers to come with the probable decision for the
organization before or during the project.
• Software Repositories: As of late 2012, our Web searches show that Mozilla Firefox had 800,000 bug reports, and
platforms such as Sourceforge.net and GitHub hosted 324,000 and 11.2 million projects, respectively.
• The PROMISE repository of software engineering data has grown to more than 100 projects and is just one of more
than a dozen open source repositories that are readily available to industrial practitioners and researchers
• Jones, C 2018, Software Methodologies. [Electronic Resource] : A Quantitative Guide, n.p.: Boca Raton : CRC
Press/Taylor & Francis
• Yang, Y. et al (2018) Actionable Analytics for Software Engineering, Actionable Analytics, Guest editors Introduction to
Special Issue on Actionable Analytics for SE, IEEE Software, Jan/Feb 2018
• Menzies, T and Zimmermann, T (2013) Software Analytics: So What?,” IEEE Software, vol. 30, no. 4, 2013
• Applications: Bug Data Prediction Models, ML for SPI, and Agile Methods
Repositories of software engineering data
Menzies, T and Zimmermann, T (2013) Software Analytics: So What?,” IEEE Software, vol. 30, no. 4, 2013
Software Engineering Approaches
for Data Science Applications
Data Science Process
Data Science Application 1:
Bug Data Prediction Models using
cloud based machine learning
https://app.box.com/s/b1g4jsp4k7f9cof90an6dg1qju0vv9uh
Application 2: Cloud Based
Machine Learning Tool for Agile
Method Decision Making
This tool is useful to know when making decision on choosing an Agile methods
based on project size and constraints.
https://app.box.com/s/7q8sjo0zf36qpsahoqvv5ai28forv0vn
Application 3: Cloud Based
Machine Learning for Software
Process Improvement
https://app.box.com/s/6gd4zgimtz11014eavu3wv2c8cc1do6h
Process Mining and Business Intelligence with Machine
Learning
Discover
business
process
Modelling &
Simulation
Data
Analytics
Observe
performanc
e and other
key daya
Data
Preparation
& Cleaning
with Extract,
Transform,
and Load
Data (ETL
process)
Cloud Based
Machine Learning
(Azure/ML)
Predictive
Modelling
Introduce
and improve
business
changes
Benefits of Process Mining with Machine
Learning
• Live data from the business services (event logs and performance
data) to study the impact patterns
• Useful to study business and service patterns
• Useful to study business and service improvements
• Useful to improve change management process
• Reuse of services and data
• Improved QoS
• Predictive modelling for efficiency and QoS
Predictive Modelling for Loan Business Processes
Predictive Modelling for Service Desk Application
Occurrence of Interactions (blue) and Incidents (orange) in time
with marked Changes (black) for Configuration Item
“SBA000607” related to Service Component “WBS000263”
Suchy, J and Suchy, M (2012) Predictive Model for supporting ITIL Business Change
Management Processes, BI 2012
Software Engineering Framework for
Service and Cloud Computing (SEF-
SCC) for Developing Data Science
Applications
SEF-SCC Framework Poster,
https://app.box.com/s/u5fcktx687fy6qv2nhgzp93e5n9i7r8p
SEF-SCC: Service-Security-Reuse – A Three Integrated
Service Engineering Framework
31
ServiceDevelopment
•Accuracy, Correctness & QoS
Design Principles
•Service RE with BPMN and
Simulation
•Service Design with Service
Components (SoaML)
•Service Development (any
platform)
•Service Testing & Deployment
& Continuous Delivery
SoftwareSecurity
Engineering
•Building Security In (BSI),
Resiliency, Fault-Tolerance
Design Principles
•Service Security RE with
Misuse & Abuse Use cases for
all identified services
•Threat Modelling
•Design for Security
•Building Security In &
Resiliency, Fault-tolerance,
•Software security testing
ServiceReuseEngineering
•Design for Reuse & Design
with Reuse, Composable,
Scalable Design Principles
•Reuse RE (Commonality &
Variability Analysis of secured
requirements on selected
BPMN & Secured use cases)
•Design for reuse approaches
•Reuse Development
(Implementing composable
services)
•Testing for reuse, composition
& integration testing strategies
Software
Engineering
Lifecycle for
Service and
Cloud Computing
(SEL4SCC)
32
Service Requirements
•Initial process models: Actors/roles/Workflows
•Detailed workflows
•Service Task modelling
•UI prototyping
•Process Simulation:
•Configure Resources need for tasks
•Load profiles in sec/min/days/no.of instances
•Start the Process Simulation as a Service (PSSaaS)
SOA Requirements with use case modelling, story cards, (Agile), Story Boards, CRC Cards,
Feature-Oriented modelling
SOA Design with Service Component Models
SOA Implementation with SOAP/RESTful
SOA Test & Deliver
Cloud service security development process with Building Security In
(BSI) – Our Systematic Approach to developing secure services
Requirements Engineering
for cloud applications
(BPMN, Use Case, RE Doc,
etc)
Conduct
BPM
Identify
SLAs
Design
service
components
Test &
Deploy
Cloud service security
requirements (misuse &
Abuse Cases, Threat
Modelling, Software
Security RE Doc)
Cloud business
process security
vulnerabilities
SLAs
security
rules and
risks
Cloud service
security
specific
components &
interfaces
Cloud
service
security
testing
Build-In Security (BSI) – Cloud service development with build-in
security
Cloud service
development
RE
for
BD
Arruda D., and Madhavji, N.H. (2017) Towards a Big Data Requirements Engineering Artefact Model in the Context of Big Data Software Development Projects,
2017 IEEE International Conference on Big Data (BIGDATA)
BD Service
Component
Model: Data
Processing
layer
SEF4BD Reference Architecture for Big Data: Secure Service-
Oriented SE for BD
36
BD Source & Storage
Layer: Data Integration &
Storage: IoT-Cloud-Data
BD Processing Layer: BD
Extraction-Processing-
Analysis-Knowledge
Discovery-Analytics-
Visualizations Services
BD Applications &
Prediction Layer: BD
Service and Knowledge &
Reuse Integration and
Patterns, Machine Learning
Services
BD Application Services
Data Extraction as a Service (DEaaS)
Data Processing as a Service (DPaaS)
Data Analysis as a Service (DAaaS)
Knowledge Discovery as a Service
(KDaaS)
Knowledge Patterns as a Service (KPaaS)
Reuse Patterns as a Service (RPaaS)
Data Streaming as a Service
(DSaaS): Multi Sources: IoT,
Sensors, Mobile, Cloud, Web
Services, etc.
Data Batch Processing as a
Service (DBaaS) – depending
on the nature of application
Data Clustering,
Allocation, Cataloguing &
Pruning as a Service
Multi-Cloud as a Service
(MCaaS)
Data Transformation as a
Service (DTaaS)
Data Analytics as a Service
(DAtaaS)
Data Visualisation as a
Service (DVaaS)
Data Integration,
Data API, Data
Analysis as a
Service (DIAAaaS)
BD
Enterprise
Service
Bus (BD-
ESB)
Pääkkönen, P and Pakkala, D (2015) Reference Architecture and Classification of Technologies, Products and Services for Big Data
Systems, Big Data Research, Elsevier, http://dx.doi.org/10.1016/j.bdr.2015.01.001
Machine & Deep
Learning as a
service (MDLaaS)
Data Processing
Security as a
Service (DPSaaS)
Data Storage
Security as
a Service
(DSSaaS)
Data Application
Security as a
Service (DASaaS)
Mapping Services to SOA Design
Task oriented
services
Utility
Business
Coordination
Entity oriented
services
Utility
Business
Coordination
Enterprise
services (B2B)
Utility
Business
Coordination
37
As an Architect, you will need to
categorise services therefore you
will be able to place them in the
appropriate architecture layers
on the right
SOA Architecture (soaML design) for SEF4BD with REF4BD (Reference
Architecture)
Facebook Real-Time Big Data Analytics
Facebook Real-time streaming services
Chen, G. J et al. (2016) Real-time Data Processing at Facebook, ACM SIGMOD 2016 San Francisco, CA USA
Many companies have developed their own systems:
examples include Twitter's Storm [28] and Heron [20],
Google's Millwheel [9], and LinkedIn's Samza [4].
Facebook's used its own tools known as Puma, Swift,
and Stylus stream processing systems.
Facebook has identified important design decisions:
performance, fault tolerance, scalability, and
correctness.
An example streaming application
with 4 nodes: this application
computes trending" events.
BPMN Framework for Validating the Reference Architecture (REF4BD)
BPMN 2.0 modelling & simulation tools:
BonitaSoft 7.8, https://www.bonitasoft.com/
Visual Paradigm, https://www.visual-
paradigm.com/features/bpmn-diagram-and-tools/
Bizagi Studio,
https://www.bizagi.com/uk/products/bpm-suite/studio
Validating the Reference Architecture for BD (REF4BD)
Results, Analysis, and Conclusion
• The results shows number of times a particular business service used to
process that data, and time taken.
• In addition, BPMN 2.0 also shows a number of time each resource have
been used such as API, Data Scientist (Human Tasks in BPMN), Data
Repository, Servers, Firewall, Data Storage, etc.
• The results shows by implementing Facebook types of big data processing
into REF4BD is more secure and uses resources efficiently than suing non-
standard architectures. The efficiency result shows about 95% use of
automated processing by API and Data Application (Service Components)
services.
• Compared to Facebook streaming application which uses more filters
which has extra-overheads and resources required whereas REF4BD is
more predictable, and can achieve correctness, fault-tolerance, and
scalability since it is standardised across all data process applications and
services.
Summary and Questions
• SOA has emerged based on established software design principles of find-
request-service paradigm suitable for service-oriented applications such as big
data processing and analytics. Therefore, it is time to consider systematic and
engineering approach to developing and deploying big data services as the
data-driven applications and devices increasing rapidly.
• In this context, this paper proposed a software engineering framework and a
reference architecture which is SOA based for big data applications’
development. This paper also concluded with a simulation of a complex big
data Facebook application with real-time streaming using BPMN simulation to
study the characteristics before big data service design, development, and
deployment. The simulation results demonstrated the efficiency and
effectiveness of developing big data applications using the reference
architecture framework for big data.
• To be sustainable, we need an approach which is systematic, business-driven
(supporting business-process and value driven), and based on established
Software Engineering practices
Gortan, I., Bener, A., and Mockus, A (2016) Software Engineering for Big Data Systems, Special Issue, IEEE Software, March/April 2016
Chen, G. J (2016) Real-time Data Processing at Facebook, ACM SIGMOD 2016 San Francisco, CA USA
SAKET NAVLAKHA AND ZIV BAR-JOSEPH (2015) Distributed Information Processing in Biological and Computational Systems, COMMUNICATIONS OF THE ACM | JANUARY 2015 | VOL. 58 | NO. 1
Big data of complex networks / edited by Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, Andreas Holzinger
Alessandro Fontana, Borys Wrobel (2013) Evolution and development of complex computational systems using the paradigm of metabolic computing in Epigenetic Tracking, Wivace 2013 - Italian Workshop on Artificial Life and
Evolutionary Computation
Sommerville, I (2016) Software Engineering, 10th edition, Pearson
BCS (2004) The Challenges of Complex IT Projects, The report of a working group from The Royal Academy of Engineering and The British Computer Society
Ramachandran, M (2008) Software Components: Guidelines and Applications, Nova Science Publications
Ramachandran, M (2012) Software Security Engineering, Nova Science Publications
Ng, I et al (Eds) (2011) Complex Engineering Service Systems: Concepts and Research, Springer, London
Zanetti, S.M (2013) A COMPLEX SYSTEMS APPROACH TO SOFTWARE ENGINEERING, DSc thesis, ETH ZURICH
Jin, X., et al (2015) Significance and Challenges of Big Data Research, Big Data Research (2015) 59–64 http://dx.doi.org/10.1016/j.bdr.2015.01.006
Caldarelli, G and Vespignani, A (eds) (2007) Large Scale Structure and Dynamics of Complex Networks from information technology to finance and natural science, World Scientific Publishing Co. Pte. Ltd.
Cao, L.B (2017) Data Science: Challenges and Directions, COMMUNICATIONS OF THE ACM, 60 (8), August
Cao, L.B (2015). Metasynthetic Computing and Engineering of Complex Systems. Springer-Verlag, London, U.K.
Cady, F (2017) Data Science Handbook, Wiley
Wolfgang Karl Härdle, Henry Horng Lu, Xiaotong Shen, Shen Xiaotong eds. (2017) Handbook of Big Data Analytics
Bühlmann, Peter, et al. eds. (2017) Handbook of big data, CRC/Chapman & Hall
Albert Y. Zomaya, Sherif Sakr eds. (2017) Handbook of Big Data Technologies, Springer
Bernard Marr (2017) Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of Things, Kogan
Mohammed M. Alani, Hissam Tawfik, Mohammed Saeed, Obinna Anya (2018) Applications of Big Data Analytics: Trends, issues, & Challenges, Springer
Deng, Julia, Savas, Onur (2017) eds. Big Data Analytics in Cybersecurity, CRC
Tajunisha N., Sruthika P. (2016) Handbook On Big Data Analytics, Amazon Digital Services LLC, 2016
References

More Related Content

What's hot

Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
Rajesh Menon
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
sambiswal
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
Sourabh Saxena
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
Ricky Barron
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
Denodo
 
BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827Anthony Potappel
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
Bui Ha
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using Hadoop
Srikanth VNV
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Lviv Startup Club
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
Revolution Analytics
 

What's hot (20)

Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
VamsiKrishna Maddiboina
VamsiKrishna MaddiboinaVamsiKrishna Maddiboina
VamsiKrishna Maddiboina
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using Hadoop
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 

Similar to Big Data SE vs. SE for Big Data

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Vipin Batra
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
Sitamarhi Institute of Technology
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
Edgar Alejandro Villegas
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
Mark Kromer
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
Zeeshan Khan
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
RTTS
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
orcoxsm
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
Satish Bhatia
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big data
Pratiksha Manan
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big data
Pratiksha Manan
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 

Similar to Big Data SE vs. SE for Big Data (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Big Data
Big DataBig Data
Big Data
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big data
 
Real-time Analytics in Big data
Real-time Analytics in Big dataReal-time Analytics in Big data
Real-time Analytics in Big data
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 

More from Institute of Contemporary Sciences

First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
Institute of Contemporary Sciences
 
Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...
Institute of Contemporary Sciences
 
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Institute of Contemporary Sciences
 
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Institute of Contemporary Sciences
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
 
Application of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovApplication of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar Dilov
Institute of Contemporary Sciences
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Institute of Contemporary Sciences
 
Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...
Institute of Contemporary Sciences
 
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Institute of Contemporary Sciences
 
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Institute of Contemporary Sciences
 
From Zero to ML Hero for Underdogs - Amir Tabakovic
From Zero to ML Hero for Underdogs  - Amir TabakovicFrom Zero to ML Hero for Underdogs  - Amir Tabakovic
From Zero to ML Hero for Underdogs - Amir Tabakovic
Institute of Contemporary Sciences
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyle
Institute of Contemporary Sciences
 
The price is right - Tomislav Krizan
The price is right - Tomislav KrizanThe price is right - Tomislav Krizan
The price is right - Tomislav Krizan
Institute of Contemporary Sciences
 
When it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkWhen it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela Culibrk
Institute of Contemporary Sciences
 
Reality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicReality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos Solujic
Institute of Contemporary Sciences
 
Sensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicSensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir Brusic
Institute of Contemporary Sciences
 
Improving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity SearchImproving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity Search
Institute of Contemporary Sciences
 
Prediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionPrediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognition
Institute of Contemporary Sciences
 
Using data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentUsing data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local government
Institute of Contemporary Sciences
 
Geospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and ClimateGeospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and Climate
Institute of Contemporary Sciences
 

More from Institute of Contemporary Sciences (20)

First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
 
Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...
 
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
 
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Application of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovApplication of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar Dilov
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
 
Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...
 
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...
 
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...
 
From Zero to ML Hero for Underdogs - Amir Tabakovic
From Zero to ML Hero for Underdogs  - Amir TabakovicFrom Zero to ML Hero for Underdogs  - Amir Tabakovic
From Zero to ML Hero for Underdogs - Amir Tabakovic
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyle
 
The price is right - Tomislav Krizan
The price is right - Tomislav KrizanThe price is right - Tomislav Krizan
The price is right - Tomislav Krizan
 
When it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkWhen it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela Culibrk
 
Reality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicReality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos Solujic
 
Sensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicSensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir Brusic
 
Improving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity SearchImproving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity Search
 
Prediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionPrediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognition
 
Using data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentUsing data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local government
 
Geospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and ClimateGeospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and Climate
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

Big Data SE vs. SE for Big Data

  • 1. Big Data SE vs. SE of BD Systems Process: Business Process Driven Service Development Lifecycle (BPD-SDL) Methods and Design Principles: service components with soalML Reference Architecture for big data (REF4BD) Tools (SAS, Visual Paradigm, BonitaSoft, Bizagi Studio, Tabulea, Mathematica, Azure/ML) SOSE4BD as a Service (SOSEaaS), BDaaS, Big Data Adoption Framework as a Service (BAaaS), Software Engineering Analytics as a Service(SEAaaS), SE Prediction Model as a Service (SEPaaS), Bug Prediction as a Service with Azure/ML (MLaaS), BD Metrics as a Service (BDMaaS) Adoption Models Evaluation & Applications Dr Muthu Ramachandran PhD FBCS Fellow of Advance HE, MIEEE, MACM Principal Lecturer Software Engineering Technologies and Emerging Practices (SETEP) Research Group Lead School of Computing, Creative Technologies, and Engineering (CCTE) Leeds Beckett University Email: M.Ramachandran@leedsbeckett.ac.uk
  • 2. Research Motivation • In this changing era of integrated IoT, Cloud, Real-time Big Data Stream (Social Media, Smart Cities, Smart Living, etc.) and services are to be Robust, Agile, Accessible and Available to its clients. For secured and guaranteed delivery of services, every big organization is shifting their service delivery model to Enterprise Service Bus (ESB). The following research questions are posed: • How do we achieve Data Reuse, Reliability, Resiliency (3Rs) and Security, Accuracy and Availability (SAA)? • How do we engineer a systematic approach to using and reusing software repositories, 50 years of software engineering knowledge and experience data for developing software and software as a service paradigm (Big Data Software Engineering)? • How do we engineer a systematic approach with 50 years of software engineering approach to data science and to develop big data systems and services (Software Engineering for Big Data)? • What are the design principles for a SOA driven reference architecture? • What are services comprise reference architecture for big data systems? • How to classify technologies and products/services of big data systems? • How should software development teams integrate Data Analytics (Data Collection, Transformation, Analytics, and Improvement) into their software development process? • What new roles, artifacts, and activities of BD process come into play? • How do we integrate BD process of new roles, artifacts, and activities tie into existing agile or DevOps process?
  • 3. BD Concepts • Big data is characterized by 8V’s: volume (large amounts of data), velocity (continuously processed data in real time), variety (unstructured, semi- structured or structured data in different formats and from multiple and diverse sources), veracity (uncertainty and trustworthiness of data), validity (relevance of data to the problem to solve), volatility (constant change of input data), and value (how data and its analysis adds value). • Big data systems are software applications that process and potentially generate big data. Such applications receive and process data from various diverse (usually distributed) sources, such as sensors, devices, whole networks, social networks, mobile devices or devices in an Internet-of- Things. They process high workloads of data and handle high requests for data. The idea is to use large amounts of data strategically and efficiently to provide additional intelligence.
  • 4. 8Vs of big data http://www.visualcapitalist.com/internet-minute-2018/ Velocity: BD requires real-time processing at varying intervals and may include stream as well as batch processing Volume: BD provides a massive historical data over several time periods (years, months, weeks, days, etc) Variety: The BD captured may be in variety of formats (multiple file files and multi-modal data) and may be structured and unstructured. Veracity: The BD captured may contain unwanted data which requires extraction, transformation, and cleaning Value: BD may contain very highly valuable data as well as not useful and it requires skilled data scientist to identify what to consider for analytical processing what to discard.
  • 5. Characteristics of BD Systems Requirements BD Systems 8 Vs of basic BD systems characteristics Data Security (Data Security with Data Lifecycle (Stored, Used, Transferred, Live, & Real-time Streaming Data), Integrate data security attributes (confidentiality, integrity and availability) Flexible DB Architecture Framework (requiring new No-SQL not only SQL-based relational db systems) and to be able to link multiple formats. BD Technologies include SQL Based (Hadoop & SPark), and No-SQL Based (Hbase, BigTable, Casandra, MongoDB) Data Cleaning Processing (ETL process – Extract, Clean, Transform, & Load) BD Analytics for Decision Making & Business Intelligence Multiple data formats Multi-channel data source (Real-time stream and batch processing) Spark is a cluster-computing framework, which means that it competes more with MapReduce than with the entire Hadoop ecosystem. For example, Spark doesn't have its own distributed filesystem, but can use HDFS. Spark SQL which integrates relational processing with the functional programming API of Spark. Querying data through SQL or the Hive query language is possible through Spark SQL. Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. MongoDB is a NoSQL database, whereas Hadoop is a framework for storing & processing Big Data in a distributed environment. MongoDB is a document oriented NoSQL database. MongoDB stores data in flexible JSON like document format. You can easily map the documents to your applications. Apache Cassandra is a NoSQL database ideal for high-speed, online transactional data, while Hadoop is a big data analytics system that focuses on data warehousing and data lake use cases. MapReduce is a programming paradigm for processing and handling large data sets. Apache HBase is a column-oriented, NoSQL database built on top of Hadoop (HDFS, to be exact). It is an open source implementation of Google's Bigtable. List of ETL Tools, https://www.etltool.com/list-of-etl-tools/ BD Analytics is the application of analysis, data, and systematic reasoning to make decisions, learn patterns of occurring to extract knowledge for improving process and business efficiency and cost. Analytics allows for summarising, filtering, modelling, and experimenting. Tools and techniques include A/B testing, statistical modelling, machine learning, deep learning, natural language processing, and muti-linear subspace learning. Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster.
  • 6. Several research challenges of software engineering for developing big data systems • Surveying the existing software engineering literature on applying software engineering principles into developing and supporting big data systems • Identifying the fields of application for big data software systems • Investigating the software engineering knowledge areas that have seen research related to big data systems • Revealing the gaps in the knowledge areas that require more focus for big data systems development • Determining the open research challenges in each software engineering knowledge area that need to be met • To be sustainable, we need an approach which is systematic, business- driven (supporting business-process and value driven), and based on established Software Engineering practices
  • 7. 50 Years of SE • 60 software development methodologies • 50 static analysis tools • 40 software design methods • 37 benchmark organizations • 25 size metrics • 20 kinds of project management tools • 22 kinds of testing and dozens of other tool variations. • Minimum of 3000 programming languages software consisted, even though only 100 were frequently used. New programming languages are announced every 2 weeks, and new tools are out more than one in each month. Every 10 months new methodologies are discovered. • Newly emerged service computing paradigms (SOA, Web Services, Micro Services, Cloud SE, etc.) and established SE abstractions (Objects, classes, components, service components, virtualisation techniques, resource-oriented computing, and containers, etc.)
  • 8. Key Areas of Research Challenges • Software Engineering for Big Data which can provide a systematic process for improving the development of big data systems. The process includes requirements gathering for BD, software architecture for BD, testing and debugging BD systems (performance, reliability, and security) where the logs of analysing 5V characteristics should be included, SE process for BD which could include CMMI, and finally Managing BD projects. • Big Data Software Engineering is an area of research which should focus on utilising BD for the benefit of improving SE practices and to improve software production. The typical activities should include analytics for software engineering, data mining software repositories, visual analytics for software engineering, and self-adaptive systems which utilises data generated and self-learn.
  • 9. Four Pillars of SE4BD Principles •Data Security with Data Lifecycle (Stored, Used, Transferred, Live, & Real-time Streaming Data), Integrate data security attributes (confidentiality, integrity and availability) and apply Software Security Engineering Principles while Data Modelling for BD: •Misuse and Abuse cases for service requirements •Threat Modelling •Secure SDLC •Service Composition •Autonomic Computing •Knowledge Discovery, Extraction, & Patterns for Reuse •Reuse by Analytics Patterns & Predictive Analytics Patterns •BDSE Metrics with Data Points •Data Processing Architecture for Integrating IoT-Cloud-BD •Support both stream & batch processing data •Design for data complexity •SOSE4BD •Agile for BDSE •Requirements Engineering for BD with BPMN Simulation •Design for Data Service Reuse •Design for Data Security (Achieving BSI) •SOSE4BD Framework: Design methods based on soaML and service components, Process, Applications, and Evaluation Service- Oriented SE for BD Applications Integrated Services (IoT- Cloud-BD) Data Security (Building Security In (BSI)) Reuse- Oriented Knowledge Discovery & Patterns Integrated Services (IoT-Cloud- BD) Data Security Reuse-Oriented Knowledge Discovery & Patterns Service-Oriented SE for BD Applications
  • 10. Software Engineering Framework for Big Data Systems (SE4BD) Framework: methods, process, reference architecture (REF4BD) applications, adoption and evaluation
  • 11. SE4BD Framework Process: Business Process Driven Service Development Lifecycle (BPD-SDL) Methods and Design Principles: service components with soalML Reference Architecture for big data (REF4BD) Tools (SAS, Visual Paradigm, BonitaSoft, Bizagi Studio, Tabulea, Mathematica, Azure/ML) SOSE4BD as a Service (SOSEaaS), BDaaS, Big Data Adoption Framework as a Service (BAaaS), Software Engineering Analytics as a Service(SEAaaS), SE Prediction Model as a Service (SEPaaS), Bug Prediction as a Service with Azure/ML (MLaaS), BD Metrics as a Service (BDMaaS) Adoption Models Evaluation & Applications SE4BD is process-centric approach which provides process, Methods and design principles based on service components with soaML, reference architecture, tools, applications, adoption models, and evaluation
  • 12. Benefits of SE Approach to BD Systems SE4BD Improved productivity & reuse (Reuse of 50 years of Software development best practices, knowledge and experiences) RE for BDS; Design Methods for BDS, References Architecture for BDS; Cloud Based Implementation and Automating BDS Services, Test & Deployment of new BDS services Learning from mistakes (Repositories of Software Engineering Data Menzies and Zimmermann (2013) Software Reuse to Data Reuse: Knowledge extraction, Reusable Patterns, BD Adoption framework, Validating Data with Simulation Tools, SPI Approaches to Business Intelligence & Improvement Software Engineering of BD Analytics with SOA Visual Analytics (data on bugs, vulnerabilities, software quality, usability, reusability, faults 7 failures, process, project management, etc.) Software Predictive Modelling & Analytics with Cloud Based Machine Learning Tools (Azure/ML) Improved Business Intelligence of BDS Secure new BDS as a Service Menzies, T and Zimmermann, T (2013) Software Analytics: So What?, IEEE Software, vol. 30, no. 4, 2013
  • 13. Benefits of Big Data SE BDSE Improved productivity & reuse (Reuse of 50 years of Software development best practices, knowledge and experiences) Learning from mistakes (Repositories of Software Engineering Data Menzies and Zimmermann (2013) Software Engineering Analytics Visual Analytics (data on bugs, vulnerabilities, software quality, usability, reusability, faults & failures, process, project management, etc.) Software Predictive Modelling & Analytics with Cloud Based Machine Learning Tools (Azure/ML) Improved Product Line, productivity, Quality, QoS, Process & Reuse Secure Software & Software as a Service Menzies, T and Zimmermann, T (2013) Software Analytics: So What?, IEEE Software, vol. 30, no. 4, 2013
  • 14. Data Science for Software Engineering
  • 15. SE Analytics & Big Data SE • Big data analytics, visualisation, and predictions has been useful and very popular research and applications in recent years. However, in the context of application of the big data practices to Software Engineering Analytics, there are some key questions need to be addressed: • How do we apply to software engineering analytics? • How do we collect and access SE experience data? • How do we apply them to decision making for business as well as the SE practices: process, methods, and technology? • How do we apply them for Software Process Improvement and Software Practices Improvement (SPI) • How do we apply them for software business process improvement?
  • 16. SE for BD Analytics Organisation, Best Practices Data, Business Data, Software Usage Experience Data, Process Experience Data, SE Practices Data Big Data (Prevailing Experience) Analytics & Visualisation Predictions Decision Making & Smart Marketing Improvement of Business & Software Process, methods, Practices, Team Spirit, Test Practices, Products, Quality
  • 17. SE4BD Lifecycle SEF4CC BD Adoption Framework Big Data Source RE Big Data Transformation RE Big Data Consumer RE Big Data Capability RE Big Data Security & Privacy RE Big Data Visualisation and Analytics RE Business Intelligence RE (Business Improvements) Use BPM modelling and simulation to validate new data and business improvement services
  • 18. Actionable SE Analytics Tools with SE Data Repositories • Code coverage and visualisation tools • Project Management & Monitoring Tools • There are other tools to support management such as PROM, Hackystat which are capable of monitoring and to generate a statistical report of a software project. However, even after years of development, there is no major upgrade that can help predict management decisions (Zimmermann, 2013). Most of these tools were focused mostly more on collection rather than any critical analysis. • Later, more tools were developed that realized the manger problems and created which focused data presentation rather than just collecting data. Tools like, Microsoft’s Team Foundation Server and IBM’ which keeps developers updated with modification, bugs and build results (Zimmermann, 2013). Other Project management tools such as Automated Project Office(APO) (Jones, 2017) also helps managers to come with the probable decision for the organization before or during the project. • Software Repositories: As of late 2012, our Web searches show that Mozilla Firefox had 800,000 bug reports, and platforms such as Sourceforge.net and GitHub hosted 324,000 and 11.2 million projects, respectively. • The PROMISE repository of software engineering data has grown to more than 100 projects and is just one of more than a dozen open source repositories that are readily available to industrial practitioners and researchers • Jones, C 2018, Software Methodologies. [Electronic Resource] : A Quantitative Guide, n.p.: Boca Raton : CRC Press/Taylor & Francis • Yang, Y. et al (2018) Actionable Analytics for Software Engineering, Actionable Analytics, Guest editors Introduction to Special Issue on Actionable Analytics for SE, IEEE Software, Jan/Feb 2018 • Menzies, T and Zimmermann, T (2013) Software Analytics: So What?,” IEEE Software, vol. 30, no. 4, 2013 • Applications: Bug Data Prediction Models, ML for SPI, and Agile Methods
  • 19. Repositories of software engineering data Menzies, T and Zimmermann, T (2013) Software Analytics: So What?,” IEEE Software, vol. 30, no. 4, 2013
  • 20. Software Engineering Approaches for Data Science Applications
  • 22. Data Science Application 1: Bug Data Prediction Models using cloud based machine learning https://app.box.com/s/b1g4jsp4k7f9cof90an6dg1qju0vv9uh
  • 23. Application 2: Cloud Based Machine Learning Tool for Agile Method Decision Making This tool is useful to know when making decision on choosing an Agile methods based on project size and constraints. https://app.box.com/s/7q8sjo0zf36qpsahoqvv5ai28forv0vn
  • 24. Application 3: Cloud Based Machine Learning for Software Process Improvement https://app.box.com/s/6gd4zgimtz11014eavu3wv2c8cc1do6h
  • 25.
  • 26. Process Mining and Business Intelligence with Machine Learning Discover business process Modelling & Simulation Data Analytics Observe performanc e and other key daya Data Preparation & Cleaning with Extract, Transform, and Load Data (ETL process) Cloud Based Machine Learning (Azure/ML) Predictive Modelling Introduce and improve business changes
  • 27. Benefits of Process Mining with Machine Learning • Live data from the business services (event logs and performance data) to study the impact patterns • Useful to study business and service patterns • Useful to study business and service improvements • Useful to improve change management process • Reuse of services and data • Improved QoS • Predictive modelling for efficiency and QoS
  • 28. Predictive Modelling for Loan Business Processes
  • 29. Predictive Modelling for Service Desk Application Occurrence of Interactions (blue) and Incidents (orange) in time with marked Changes (black) for Configuration Item “SBA000607” related to Service Component “WBS000263” Suchy, J and Suchy, M (2012) Predictive Model for supporting ITIL Business Change Management Processes, BI 2012
  • 30. Software Engineering Framework for Service and Cloud Computing (SEF- SCC) for Developing Data Science Applications SEF-SCC Framework Poster, https://app.box.com/s/u5fcktx687fy6qv2nhgzp93e5n9i7r8p
  • 31. SEF-SCC: Service-Security-Reuse – A Three Integrated Service Engineering Framework 31 ServiceDevelopment •Accuracy, Correctness & QoS Design Principles •Service RE with BPMN and Simulation •Service Design with Service Components (SoaML) •Service Development (any platform) •Service Testing & Deployment & Continuous Delivery SoftwareSecurity Engineering •Building Security In (BSI), Resiliency, Fault-Tolerance Design Principles •Service Security RE with Misuse & Abuse Use cases for all identified services •Threat Modelling •Design for Security •Building Security In & Resiliency, Fault-tolerance, •Software security testing ServiceReuseEngineering •Design for Reuse & Design with Reuse, Composable, Scalable Design Principles •Reuse RE (Commonality & Variability Analysis of secured requirements on selected BPMN & Secured use cases) •Design for reuse approaches •Reuse Development (Implementing composable services) •Testing for reuse, composition & integration testing strategies
  • 32. Software Engineering Lifecycle for Service and Cloud Computing (SEL4SCC) 32 Service Requirements •Initial process models: Actors/roles/Workflows •Detailed workflows •Service Task modelling •UI prototyping •Process Simulation: •Configure Resources need for tasks •Load profiles in sec/min/days/no.of instances •Start the Process Simulation as a Service (PSSaaS) SOA Requirements with use case modelling, story cards, (Agile), Story Boards, CRC Cards, Feature-Oriented modelling SOA Design with Service Component Models SOA Implementation with SOAP/RESTful SOA Test & Deliver
  • 33. Cloud service security development process with Building Security In (BSI) – Our Systematic Approach to developing secure services Requirements Engineering for cloud applications (BPMN, Use Case, RE Doc, etc) Conduct BPM Identify SLAs Design service components Test & Deploy Cloud service security requirements (misuse & Abuse Cases, Threat Modelling, Software Security RE Doc) Cloud business process security vulnerabilities SLAs security rules and risks Cloud service security specific components & interfaces Cloud service security testing Build-In Security (BSI) – Cloud service development with build-in security Cloud service development
  • 34. RE for BD Arruda D., and Madhavji, N.H. (2017) Towards a Big Data Requirements Engineering Artefact Model in the Context of Big Data Software Development Projects, 2017 IEEE International Conference on Big Data (BIGDATA)
  • 36. SEF4BD Reference Architecture for Big Data: Secure Service- Oriented SE for BD 36 BD Source & Storage Layer: Data Integration & Storage: IoT-Cloud-Data BD Processing Layer: BD Extraction-Processing- Analysis-Knowledge Discovery-Analytics- Visualizations Services BD Applications & Prediction Layer: BD Service and Knowledge & Reuse Integration and Patterns, Machine Learning Services BD Application Services Data Extraction as a Service (DEaaS) Data Processing as a Service (DPaaS) Data Analysis as a Service (DAaaS) Knowledge Discovery as a Service (KDaaS) Knowledge Patterns as a Service (KPaaS) Reuse Patterns as a Service (RPaaS) Data Streaming as a Service (DSaaS): Multi Sources: IoT, Sensors, Mobile, Cloud, Web Services, etc. Data Batch Processing as a Service (DBaaS) – depending on the nature of application Data Clustering, Allocation, Cataloguing & Pruning as a Service Multi-Cloud as a Service (MCaaS) Data Transformation as a Service (DTaaS) Data Analytics as a Service (DAtaaS) Data Visualisation as a Service (DVaaS) Data Integration, Data API, Data Analysis as a Service (DIAAaaS) BD Enterprise Service Bus (BD- ESB) Pääkkönen, P and Pakkala, D (2015) Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems, Big Data Research, Elsevier, http://dx.doi.org/10.1016/j.bdr.2015.01.001 Machine & Deep Learning as a service (MDLaaS) Data Processing Security as a Service (DPSaaS) Data Storage Security as a Service (DSSaaS) Data Application Security as a Service (DASaaS)
  • 37. Mapping Services to SOA Design Task oriented services Utility Business Coordination Entity oriented services Utility Business Coordination Enterprise services (B2B) Utility Business Coordination 37 As an Architect, you will need to categorise services therefore you will be able to place them in the appropriate architecture layers on the right
  • 38. SOA Architecture (soaML design) for SEF4BD with REF4BD (Reference Architecture)
  • 39. Facebook Real-Time Big Data Analytics
  • 40. Facebook Real-time streaming services Chen, G. J et al. (2016) Real-time Data Processing at Facebook, ACM SIGMOD 2016 San Francisco, CA USA Many companies have developed their own systems: examples include Twitter's Storm [28] and Heron [20], Google's Millwheel [9], and LinkedIn's Samza [4]. Facebook's used its own tools known as Puma, Swift, and Stylus stream processing systems. Facebook has identified important design decisions: performance, fault tolerance, scalability, and correctness. An example streaming application with 4 nodes: this application computes trending" events.
  • 41. BPMN Framework for Validating the Reference Architecture (REF4BD) BPMN 2.0 modelling & simulation tools: BonitaSoft 7.8, https://www.bonitasoft.com/ Visual Paradigm, https://www.visual- paradigm.com/features/bpmn-diagram-and-tools/ Bizagi Studio, https://www.bizagi.com/uk/products/bpm-suite/studio
  • 42. Validating the Reference Architecture for BD (REF4BD)
  • 43.
  • 44. Results, Analysis, and Conclusion • The results shows number of times a particular business service used to process that data, and time taken. • In addition, BPMN 2.0 also shows a number of time each resource have been used such as API, Data Scientist (Human Tasks in BPMN), Data Repository, Servers, Firewall, Data Storage, etc. • The results shows by implementing Facebook types of big data processing into REF4BD is more secure and uses resources efficiently than suing non- standard architectures. The efficiency result shows about 95% use of automated processing by API and Data Application (Service Components) services. • Compared to Facebook streaming application which uses more filters which has extra-overheads and resources required whereas REF4BD is more predictable, and can achieve correctness, fault-tolerance, and scalability since it is standardised across all data process applications and services.
  • 45. Summary and Questions • SOA has emerged based on established software design principles of find- request-service paradigm suitable for service-oriented applications such as big data processing and analytics. Therefore, it is time to consider systematic and engineering approach to developing and deploying big data services as the data-driven applications and devices increasing rapidly. • In this context, this paper proposed a software engineering framework and a reference architecture which is SOA based for big data applications’ development. This paper also concluded with a simulation of a complex big data Facebook application with real-time streaming using BPMN simulation to study the characteristics before big data service design, development, and deployment. The simulation results demonstrated the efficiency and effectiveness of developing big data applications using the reference architecture framework for big data. • To be sustainable, we need an approach which is systematic, business-driven (supporting business-process and value driven), and based on established Software Engineering practices
  • 46. Gortan, I., Bener, A., and Mockus, A (2016) Software Engineering for Big Data Systems, Special Issue, IEEE Software, March/April 2016 Chen, G. J (2016) Real-time Data Processing at Facebook, ACM SIGMOD 2016 San Francisco, CA USA SAKET NAVLAKHA AND ZIV BAR-JOSEPH (2015) Distributed Information Processing in Biological and Computational Systems, COMMUNICATIONS OF THE ACM | JANUARY 2015 | VOL. 58 | NO. 1 Big data of complex networks / edited by Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, Andreas Holzinger Alessandro Fontana, Borys Wrobel (2013) Evolution and development of complex computational systems using the paradigm of metabolic computing in Epigenetic Tracking, Wivace 2013 - Italian Workshop on Artificial Life and Evolutionary Computation Sommerville, I (2016) Software Engineering, 10th edition, Pearson BCS (2004) The Challenges of Complex IT Projects, The report of a working group from The Royal Academy of Engineering and The British Computer Society Ramachandran, M (2008) Software Components: Guidelines and Applications, Nova Science Publications Ramachandran, M (2012) Software Security Engineering, Nova Science Publications Ng, I et al (Eds) (2011) Complex Engineering Service Systems: Concepts and Research, Springer, London Zanetti, S.M (2013) A COMPLEX SYSTEMS APPROACH TO SOFTWARE ENGINEERING, DSc thesis, ETH ZURICH Jin, X., et al (2015) Significance and Challenges of Big Data Research, Big Data Research (2015) 59–64 http://dx.doi.org/10.1016/j.bdr.2015.01.006 Caldarelli, G and Vespignani, A (eds) (2007) Large Scale Structure and Dynamics of Complex Networks from information technology to finance and natural science, World Scientific Publishing Co. Pte. Ltd. Cao, L.B (2017) Data Science: Challenges and Directions, COMMUNICATIONS OF THE ACM, 60 (8), August Cao, L.B (2015). Metasynthetic Computing and Engineering of Complex Systems. Springer-Verlag, London, U.K. Cady, F (2017) Data Science Handbook, Wiley Wolfgang Karl Härdle, Henry Horng Lu, Xiaotong Shen, Shen Xiaotong eds. (2017) Handbook of Big Data Analytics Bühlmann, Peter, et al. eds. (2017) Handbook of big data, CRC/Chapman & Hall Albert Y. Zomaya, Sherif Sakr eds. (2017) Handbook of Big Data Technologies, Springer Bernard Marr (2017) Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of Things, Kogan Mohammed M. Alani, Hissam Tawfik, Mohammed Saeed, Obinna Anya (2018) Applications of Big Data Analytics: Trends, issues, & Challenges, Springer Deng, Julia, Savas, Onur (2017) eds. Big Data Analytics in Cybersecurity, CRC Tajunisha N., Sruthika P. (2016) Handbook On Big Data Analytics, Amazon Digital Services LLC, 2016 References

Editor's Notes

  1. Iotbds 2019
  2. Iotbds 2019
  3. Iotbds 2019