2. • What Is Big Data?
• Big Data recourse
• Big Data Important
• Big Data challenge
• Big Data Character
• Big Data Architecture
• Big Data Technology
• What is cloud computing?
• Clouding Model
• What is fundamental Service model
• what ClAaas (Cloud Analysis as a service)?
• what is data analytic?
• what’s scientific workflow systems means?
2
3. Big data refers to data sets
whose size is beyond the
ability of typical database
software tools to capture,
store, manage, and analyze.
McKinsey Global Institute (MGI) definition
3 2 min
5. Soares classifies typical sources for ‘big data’
into 5 categories:
1. Web and social media
2. Machine-to machine
3. Big transaction data
4. Biometrics
5. Human-generated data
5 2 min
7. -Government
In 2012, the Obama administration announced the Big
Data Research and Development Initiative
-Private Sector
- Walmart handles more than 1 million customer
transactions every hour,
Science
- Large Synoptic Survey Telescope will generate
140 Terabyte of data every 5 days.
7 1 min
13. Cloud computing involves computing
over a network, where a program or
application may run on many
connected computers at the same
time.
It specifically refers to a computing
hardware machine or group of
computing hardware machines
commonly referred as a server
connected through a
communication.
13 1 min
15. A cloud is called a "public cloud" when the
services are rendered over a network that is
open for public use.
15 1 min
16. Private cloud is cloud infrastructure operated
solely for a single organization, whether
managed internally or by a third-party and
hosted internally or externally
16 1 min
17. Hybrid cloud is a composition of two or more
clouds (private, community or public) that
remain distinct entities but are bound
together, offering the benefits of multiple
deployment models
17 1 min
19. 1- Scalable hardware.
2- platform required with Specific software tools.
3- Workflow management system to facilitate the
execution of a big data analytics.
19 2 min
20. If you want to discover knowledge and make a
decision, you should make data analysis In
different data and applications.
20 2 min
21. 1. Big data analytics is challenge depends on hardware
and software resources.
2. Lack analytic software to different applications.
3. It’s hard to estimate the resources to analysis or
workflow.
4. Management of data in the cloud.
5. Execute analytic workflows.
What is the solution?
The paper propose: Cloud Analysis
As a service which provide platform
in the cloud.
21 3 min
22. The paper shows:
Taxonomy for analytic workflow management
systems
to show important features in systems.
22 2 min
23. Data analytics provides decision support in
Financial by enabling complex computations to
generate knowledge, insights, and experimental
proofs.
The complexity computation will done through data
analysis.
What is
Data analytics?
23 2 min
24. Recently the data that needs to be analyzed is
Become big, therefore researcher design analytic
system to solve problem of big data.
The analytic requires several revisions before they
can be used as final jobs.
Data analytics
24 2 min
25. What is the Applications for Big Data
Analytics?
FinanceTraffic Control
25 2 min
26. Scientific workflow systems: is tool for many
application, enabling the execution of
complex analysis on distributed resources.
What is scientific workflow system????
Is analytic workflow management System
exist today, which execute a series of
computational or data manipulation steps or a
workflow, in a scientific application.
26 3 min
27. Analytic Workflow:
Consist of sequence of data and integration tasks
or exploratory analytic jobs.
such as: definition and execution of machine
learning models.
A task in a workflow can also be another workflow.
A workflow management system is necessary for
efficient definition and execution of analytic
workflows. 27 3 min
28. Software tools for analytic jobs will different
depending on:
1- the data type.
2- size.
3- domain.
4-business goals.
The analytic tools to the data sources is important
especially for big data, to avoid data transfer time
and network cost. 28 2 min
29. There are systems different depending on:
1- In their focus on data domain.
2- Workflow types.
3- Representations.
4- Execution.
5- Support for collaboration and visualization.
29
1- Taverna and XBaya support distributed execution of
workflows using 3rd party web and data services.
2- to analysis of large data, grid technology is used for
distributed mapping of workflows by WINGS, DAGMan and
Kepler.
3 min
30. The most successful workflow are Taverna, XBaya.
The arrows encodes the data dependences from one activity in the
workflow to another.
Different systems use different approaches to encapsulating an
activity.
In Taverna and Xbaya these are web services.
30 2 min
31. • As illustrated in Figure, an activity have multiple
inputs and produce multiple outputs and the job
workflow engine is to create an activity as soon as its
inputs are all available.
• Directed acyclic graph representation of a workflow.
Activit
y
31 2 min
32. 32
XBaya control structures for conditionals
and map reduce style iteration layered
on top of a data flow graph.
1 min
33. Taxonomies: is classification of the analytic
workflow systems to identify the features of CLAaaS.
Workflow System
There are seven features of the taxonomy are:
2- Security
1- Structure
5- Visualization
3- Workflow Design
4- Representation
7- Execution
6- Collaboration
Workflow systems: are task
based, each task represents a
data processing or analysis by
software or workflow.
33 3 min
34. - The information flow can be:
1- Control
2- Data or
3- Hybrid of the two.
Simple workflows have: sequential, parallel or type
selection structures.
Complex workflows Include: sub-workflows or
iterative tasks.
Structure
Content
Control
Flow
Component
structure
Layout
Data Hybrid
Complexsimple
Parallel Sequential Selection
Loop Workflow
Service
Application
Data
Access
control
34 2 min
35. For sensitive data, the encryption is used to analysis
encrypt data.
The existing systems only have some sort of access
control.
Before analysis sensitive data, Anonymization will
done.
Data anonymization is converts clear text data into a non
readable text.
Security
Data Access control
35 3 min
36. Many existing workflow systems such as WINGS,
Kepler, Triana, Vistrails and Xbaya provide graphical
workflow tools.
Graphical workflows are convert to other
representations for storage and execution.
Taverna provides a hierarchical workflow view where
high level view can be expanded to see component
details.
Workflow
design
Method Assistance Verification
Graphical
Description Hybrid
36 3 min
37. Users specify constraints in Resource Description
Framework (RDF) in WINGS, making it a hybrid
design method.
The constraints allow verification of the
workflows.
37 2 min
38. Workflows can be represented graphically using:
1- Object-based Modeling Language (UML).
2- Graph-based DAX (Extensible Markup Language representation
of Directed Acyclic Graph).
3- Event-based BPMN (Business Process Model and Notation),
which are easier to construct for small workflows using GUI tools.
High level scripting languages such as Ruby and Python are
used to automatically generate the low level complex
structures in workflows. 38 3 min
40. Effective visualization can add big value to analytics
and can vary based on the resulting data types.
Image data should have a good resolution unlike
charts or lists.
By using Visualization models, the user can
predefined or defined intelligently by the system
based on the data types as: in Kepler and VisTrails.
40 3 min
41. Collaboration is very important for correct
interpretation of the results and specification of
effective visualization models.
The commercial analytic software provide
collaboration functionality, the open source workflow
systems have little or no support for online
collaboration.
Results, data and workflows are shared and published
online, for example, in MyExperiment and BioMart.41 2 min
42. Interdependent tasks where the output from one serves
as the input for another have to be executed sequentially.
Independent tasks can take advantage of distributed
parallel execution to: maximize resource utilization and
minimize the execution time.
Workflow execution using cloud resources is currently
being explored due to the scalability required for big
data. 42
43. Markus Maier,”Towards a Big Data Reference,”1-Architecture
Master thesis, 13th October 2013.
Nrusimham Ammu, 2 Mohd Irfanuddin,” Big Data Challenges”, nternational Journal of
Advanced Trends in Computer Science and Engineering, Vol.2 , No.1, Pages : 613 - 615 (2013)
Special Issue of ICACSE 2013 - Held on 7-8 January, 2013 in Lords Institute of Engineering and
Technology, Hyderabad
https://www.youtube.com/watch?v=D4ZQxBPtyHg&hd=1
http://en.wikipedia.org/wiki/Cloud_telephony
http://en.wikipedia.org/wiki/Big_data
http://d2i.indiana.edu/sites/default/files/1-s2.0-s1877050912001755-main.pdf
https://www.youtube.com/watch?v=Ft6yz0SObIA
https://www.youtube.com/watch?v=arVoQxjIxUU
PROGRAMMING E-SCIENCE GATEWAYS – ResearchGate
http://computer.howstuffworks.com/cloud-computing/cloud-computing.htm
http://www.slideshare.net/leelashine/hadoop-29386155?qid=cb3ea67b-c33e-4e2b-
8f79-fdca7b6a1b63&v=default&b=&from_search=5
http://en.wikipedia.org/wiki/Weka_(machine_learning)
http://en.wikipedia.org/wiki/Data_mining
http://en.wikipedia.org/wiki/Taxonomy
http://siliconangle.com/blog/2013/04/03/big-data-traffic-jam-smarter-lights-
happy-drivers/
43