Dr Bassam Hammo.
Presented by: Isra Zaitoun 8131797.
and Diala Alwedyan 8131807.
• What Is Big Data?
• Big Data recourse
• Big Data Important
• Big Data challenge
• Big Data Character
• Big Data Architecture
• Big Data Technology
• What is cloud computing?
• Clouding Model
• What is fundamental Service model
• what ClAaas (Cloud Analysis as a service)?
• what is data analytic?
• what’s scientific workflow systems means?
2
Big data refers to data sets
whose size is beyond the
ability of typical database
software tools to capture,
store, manage, and analyze.
McKinsey Global Institute (MGI) definition
3 2 min
4 1 min
 Soares classifies typical sources for ‘big data’
into 5 categories:
1. Web and social media
2. Machine-to machine
3. Big transaction data
4. Biometrics
5. Human-generated data
5 2 min
6 1 min
-Government
In 2012, the Obama administration announced the Big
Data Research and Development Initiative
-Private Sector
- Walmart handles more than 1 million customer
transactions every hour,
Science
- Large Synoptic Survey Telescope will generate
140 Terabyte of data every 5 days.
7 1 min
Volume
• Data
quantity
Velocity
• Data
Speed
Variety
• Data
Types
Veracity
• Messiness
8 2 min
Heterogeneity and
Incompleteness
Scale
Timeliness
9 2 min
10
 HADOOP
 Map resource
11 1 min
12
1 min
Cloud computing involves computing
over a network, where a program or
application may run on many
connected computers at the same
time.
It specifically refers to a computing
hardware machine or group of
computing hardware machines
commonly referred as a server
connected through a
communication.
13 1 min
Public Model
Private Model
Hybrid Model
14 1 min
A cloud is called a "public cloud" when the
services are rendered over a network that is
open for public use.
15 1 min
Private cloud is cloud infrastructure operated
solely for a single organization, whether
managed internally or by a third-party and
hosted internally or externally
16 1 min
Hybrid cloud is a composition of two or more
clouds (private, community or public) that
remain distinct entities but are bound
together, offering the benefits of multiple
deployment models
17 1 min
18 2 min
1- Scalable hardware.
2- platform required with Specific software tools.
3- Workflow management system to facilitate the
execution of a big data analytics.
19 2 min
 If you want to discover knowledge and make a
decision, you should make data analysis In
different data and applications.
20 2 min
1. Big data analytics is challenge depends on hardware
and software resources.
2. Lack analytic software to different applications.
3. It’s hard to estimate the resources to analysis or
workflow.
4. Management of data in the cloud.
5. Execute analytic workflows.
What is the solution?
The paper propose: Cloud Analysis
As a service which provide platform
in the cloud.
21 3 min
 The paper shows:
Taxonomy for analytic workflow management
systems
to show important features in systems.
22 2 min
 Data analytics provides decision support in
Financial by enabling complex computations to
generate knowledge, insights, and experimental
proofs.
 The complexity computation will done through data
analysis.
What is
Data analytics?
23 2 min
 Recently the data that needs to be analyzed is
Become big, therefore researcher design analytic
system to solve problem of big data.
 The analytic requires several revisions before they
can be used as final jobs.
Data analytics
24 2 min
What is the Applications for Big Data
Analytics?
FinanceTraffic Control
25 2 min
 Scientific workflow systems: is tool for many
application, enabling the execution of
complex analysis on distributed resources.
What is scientific workflow system????
Is analytic workflow management System
exist today, which execute a series of
computational or data manipulation steps or a
workflow, in a scientific application.
26 3 min
 Analytic Workflow:
Consist of sequence of data and integration tasks
or exploratory analytic jobs.
such as: definition and execution of machine
learning models.
 A task in a workflow can also be another workflow.
 A workflow management system is necessary for
efficient definition and execution of analytic
workflows. 27 3 min
 Software tools for analytic jobs will different
depending on:
1- the data type.
2- size.
3- domain.
4-business goals.
 The analytic tools to the data sources is important
especially for big data, to avoid data transfer time
and network cost. 28 2 min
 There are systems different depending on:
1- In their focus on data domain.
2- Workflow types.
3- Representations.
4- Execution.
5- Support for collaboration and visualization.
29
1- Taverna and XBaya support distributed execution of
workflows using 3rd party web and data services.
2- to analysis of large data, grid technology is used for
distributed mapping of workflows by WINGS, DAGMan and
Kepler.
3 min
 The most successful workflow are Taverna, XBaya.
 The arrows encodes the data dependences from one activity in the
workflow to another.
 Different systems use different approaches to encapsulating an
activity.
 In Taverna and Xbaya these are web services.
30 2 min
• As illustrated in Figure, an activity have multiple
inputs and produce multiple outputs and the job
workflow engine is to create an activity as soon as its
inputs are all available.
• Directed acyclic graph representation of a workflow.
Activit
y
31 2 min
32
 XBaya control structures for conditionals
and map reduce style iteration layered
on top of a data flow graph.
1 min
 Taxonomies: is classification of the analytic
workflow systems to identify the features of CLAaaS.
Workflow System
There are seven features of the taxonomy are:
2- Security
1- Structure
5- Visualization
3- Workflow Design
4- Representation
7- Execution
6- Collaboration
Workflow systems: are task
based, each task represents a
data processing or analysis by
software or workflow.
33 3 min
- The information flow can be:
1- Control
2- Data or
3- Hybrid of the two.
 Simple workflows have: sequential, parallel or type
selection structures.
 Complex workflows Include: sub-workflows or
iterative tasks.
Structure
Content
Control
Flow
Component
structure
Layout
Data Hybrid
Complexsimple
Parallel Sequential Selection
Loop Workflow
Service
Application
Data
Access
control
34 2 min
 For sensitive data, the encryption is used to analysis
encrypt data.
 The existing systems only have some sort of access
control.
 Before analysis sensitive data, Anonymization will
done.
 Data anonymization is converts clear text data into a non
readable text.
Security
Data Access control
35 3 min
 Many existing workflow systems such as WINGS,
Kepler, Triana, Vistrails and Xbaya provide graphical
workflow tools.
 Graphical workflows are convert to other
representations for storage and execution.
 Taverna provides a hierarchical workflow view where
 high level view can be expanded to see component
details.
Workflow
design
Method Assistance Verification
Graphical
Description Hybrid
36 3 min
 Users specify constraints in Resource Description
Framework (RDF) in WINGS, making it a hybrid
design method.
 The constraints allow verification of the
workflows.
37 2 min
Workflows can be represented graphically using:
1- Object-based Modeling Language (UML).
2- Graph-based DAX (Extensible Markup Language representation
of Directed Acyclic Graph).
3- Event-based BPMN (Business Process Model and Notation),
which are easier to construct for small workflows using GUI tools.
 High level scripting languages such as Ruby and Python are
used to automatically generate the low level complex
structures in workflows. 38 3 min
39 4 min
 Effective visualization can add big value to analytics
and can vary based on the resulting data types.
 Image data should have a good resolution unlike
charts or lists.
 By using Visualization models, the user can
predefined or defined intelligently by the system
based on the data types as: in Kepler and VisTrails.
40 3 min
 Collaboration is very important for correct
interpretation of the results and specification of
effective visualization models.
 The commercial analytic software provide
collaboration functionality, the open source workflow
systems have little or no support for online
collaboration.
 Results, data and workflows are shared and published
online, for example, in MyExperiment and BioMart.41 2 min
 Interdependent tasks where the output from one serves
as the input for another have to be executed sequentially.
 Independent tasks can take advantage of distributed
parallel execution to: maximize resource utilization and
minimize the execution time.
 Workflow execution using cloud resources is currently
being explored due to the scalability required for big
data. 42
 Markus Maier,”Towards a Big Data Reference,”1-Architecture
 Master thesis, 13th October 2013.
 Nrusimham Ammu, 2 Mohd Irfanuddin,” Big Data Challenges”, nternational Journal of
Advanced Trends in Computer Science and Engineering, Vol.2 , No.1, Pages : 613 - 615 (2013)
 Special Issue of ICACSE 2013 - Held on 7-8 January, 2013 in Lords Institute of Engineering and
Technology, Hyderabad
 https://www.youtube.com/watch?v=D4ZQxBPtyHg&hd=1
 http://en.wikipedia.org/wiki/Cloud_telephony
 http://en.wikipedia.org/wiki/Big_data
 http://d2i.indiana.edu/sites/default/files/1-s2.0-s1877050912001755-main.pdf
 https://www.youtube.com/watch?v=Ft6yz0SObIA
 https://www.youtube.com/watch?v=arVoQxjIxUU
 PROGRAMMING E-SCIENCE GATEWAYS – ResearchGate
 http://computer.howstuffworks.com/cloud-computing/cloud-computing.htm
 http://www.slideshare.net/leelashine/hadoop-29386155?qid=cb3ea67b-c33e-4e2b-
8f79-fdca7b6a1b63&v=default&b=&from_search=5
 http://en.wikipedia.org/wiki/Weka_(machine_learning)
 http://en.wikipedia.org/wiki/Data_mining
 http://en.wikipedia.org/wiki/Taxonomy
 http://siliconangle.com/blog/2013/04/03/big-data-traffic-jam-smarter-lights-
happy-drivers/
43

Big data ppt diala

  • 1.
    Dr Bassam Hammo. Presentedby: Isra Zaitoun 8131797. and Diala Alwedyan 8131807.
  • 2.
    • What IsBig Data? • Big Data recourse • Big Data Important • Big Data challenge • Big Data Character • Big Data Architecture • Big Data Technology • What is cloud computing? • Clouding Model • What is fundamental Service model • what ClAaas (Cloud Analysis as a service)? • what is data analytic? • what’s scientific workflow systems means? 2
  • 3.
    Big data refersto data sets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. McKinsey Global Institute (MGI) definition 3 2 min
  • 4.
  • 5.
     Soares classifiestypical sources for ‘big data’ into 5 categories: 1. Web and social media 2. Machine-to machine 3. Big transaction data 4. Biometrics 5. Human-generated data 5 2 min
  • 6.
  • 7.
    -Government In 2012, theObama administration announced the Big Data Research and Development Initiative -Private Sector - Walmart handles more than 1 million customer transactions every hour, Science - Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days. 7 1 min
  • 8.
    Volume • Data quantity Velocity • Data Speed Variety •Data Types Veracity • Messiness 8 2 min
  • 9.
  • 10.
  • 11.
     HADOOP  Mapresource 11 1 min
  • 12.
  • 13.
    Cloud computing involvescomputing over a network, where a program or application may run on many connected computers at the same time. It specifically refers to a computing hardware machine or group of computing hardware machines commonly referred as a server connected through a communication. 13 1 min
  • 14.
  • 15.
    A cloud iscalled a "public cloud" when the services are rendered over a network that is open for public use. 15 1 min
  • 16.
    Private cloud iscloud infrastructure operated solely for a single organization, whether managed internally or by a third-party and hosted internally or externally 16 1 min
  • 17.
    Hybrid cloud isa composition of two or more clouds (private, community or public) that remain distinct entities but are bound together, offering the benefits of multiple deployment models 17 1 min
  • 18.
  • 19.
    1- Scalable hardware. 2-platform required with Specific software tools. 3- Workflow management system to facilitate the execution of a big data analytics. 19 2 min
  • 20.
     If youwant to discover knowledge and make a decision, you should make data analysis In different data and applications. 20 2 min
  • 21.
    1. Big dataanalytics is challenge depends on hardware and software resources. 2. Lack analytic software to different applications. 3. It’s hard to estimate the resources to analysis or workflow. 4. Management of data in the cloud. 5. Execute analytic workflows. What is the solution? The paper propose: Cloud Analysis As a service which provide platform in the cloud. 21 3 min
  • 22.
     The papershows: Taxonomy for analytic workflow management systems to show important features in systems. 22 2 min
  • 23.
     Data analyticsprovides decision support in Financial by enabling complex computations to generate knowledge, insights, and experimental proofs.  The complexity computation will done through data analysis. What is Data analytics? 23 2 min
  • 24.
     Recently thedata that needs to be analyzed is Become big, therefore researcher design analytic system to solve problem of big data.  The analytic requires several revisions before they can be used as final jobs. Data analytics 24 2 min
  • 25.
    What is theApplications for Big Data Analytics? FinanceTraffic Control 25 2 min
  • 26.
     Scientific workflowsystems: is tool for many application, enabling the execution of complex analysis on distributed resources. What is scientific workflow system???? Is analytic workflow management System exist today, which execute a series of computational or data manipulation steps or a workflow, in a scientific application. 26 3 min
  • 27.
     Analytic Workflow: Consistof sequence of data and integration tasks or exploratory analytic jobs. such as: definition and execution of machine learning models.  A task in a workflow can also be another workflow.  A workflow management system is necessary for efficient definition and execution of analytic workflows. 27 3 min
  • 28.
     Software toolsfor analytic jobs will different depending on: 1- the data type. 2- size. 3- domain. 4-business goals.  The analytic tools to the data sources is important especially for big data, to avoid data transfer time and network cost. 28 2 min
  • 29.
     There aresystems different depending on: 1- In their focus on data domain. 2- Workflow types. 3- Representations. 4- Execution. 5- Support for collaboration and visualization. 29 1- Taverna and XBaya support distributed execution of workflows using 3rd party web and data services. 2- to analysis of large data, grid technology is used for distributed mapping of workflows by WINGS, DAGMan and Kepler. 3 min
  • 30.
     The mostsuccessful workflow are Taverna, XBaya.  The arrows encodes the data dependences from one activity in the workflow to another.  Different systems use different approaches to encapsulating an activity.  In Taverna and Xbaya these are web services. 30 2 min
  • 31.
    • As illustratedin Figure, an activity have multiple inputs and produce multiple outputs and the job workflow engine is to create an activity as soon as its inputs are all available. • Directed acyclic graph representation of a workflow. Activit y 31 2 min
  • 32.
    32  XBaya controlstructures for conditionals and map reduce style iteration layered on top of a data flow graph. 1 min
  • 33.
     Taxonomies: isclassification of the analytic workflow systems to identify the features of CLAaaS. Workflow System There are seven features of the taxonomy are: 2- Security 1- Structure 5- Visualization 3- Workflow Design 4- Representation 7- Execution 6- Collaboration Workflow systems: are task based, each task represents a data processing or analysis by software or workflow. 33 3 min
  • 34.
    - The informationflow can be: 1- Control 2- Data or 3- Hybrid of the two.  Simple workflows have: sequential, parallel or type selection structures.  Complex workflows Include: sub-workflows or iterative tasks. Structure Content Control Flow Component structure Layout Data Hybrid Complexsimple Parallel Sequential Selection Loop Workflow Service Application Data Access control 34 2 min
  • 35.
     For sensitivedata, the encryption is used to analysis encrypt data.  The existing systems only have some sort of access control.  Before analysis sensitive data, Anonymization will done.  Data anonymization is converts clear text data into a non readable text. Security Data Access control 35 3 min
  • 36.
     Many existingworkflow systems such as WINGS, Kepler, Triana, Vistrails and Xbaya provide graphical workflow tools.  Graphical workflows are convert to other representations for storage and execution.  Taverna provides a hierarchical workflow view where  high level view can be expanded to see component details. Workflow design Method Assistance Verification Graphical Description Hybrid 36 3 min
  • 37.
     Users specifyconstraints in Resource Description Framework (RDF) in WINGS, making it a hybrid design method.  The constraints allow verification of the workflows. 37 2 min
  • 38.
    Workflows can berepresented graphically using: 1- Object-based Modeling Language (UML). 2- Graph-based DAX (Extensible Markup Language representation of Directed Acyclic Graph). 3- Event-based BPMN (Business Process Model and Notation), which are easier to construct for small workflows using GUI tools.  High level scripting languages such as Ruby and Python are used to automatically generate the low level complex structures in workflows. 38 3 min
  • 39.
  • 40.
     Effective visualizationcan add big value to analytics and can vary based on the resulting data types.  Image data should have a good resolution unlike charts or lists.  By using Visualization models, the user can predefined or defined intelligently by the system based on the data types as: in Kepler and VisTrails. 40 3 min
  • 41.
     Collaboration isvery important for correct interpretation of the results and specification of effective visualization models.  The commercial analytic software provide collaboration functionality, the open source workflow systems have little or no support for online collaboration.  Results, data and workflows are shared and published online, for example, in MyExperiment and BioMart.41 2 min
  • 42.
     Interdependent taskswhere the output from one serves as the input for another have to be executed sequentially.  Independent tasks can take advantage of distributed parallel execution to: maximize resource utilization and minimize the execution time.  Workflow execution using cloud resources is currently being explored due to the scalability required for big data. 42
  • 43.
     Markus Maier,”Towardsa Big Data Reference,”1-Architecture  Master thesis, 13th October 2013.  Nrusimham Ammu, 2 Mohd Irfanuddin,” Big Data Challenges”, nternational Journal of Advanced Trends in Computer Science and Engineering, Vol.2 , No.1, Pages : 613 - 615 (2013)  Special Issue of ICACSE 2013 - Held on 7-8 January, 2013 in Lords Institute of Engineering and Technology, Hyderabad  https://www.youtube.com/watch?v=D4ZQxBPtyHg&hd=1  http://en.wikipedia.org/wiki/Cloud_telephony  http://en.wikipedia.org/wiki/Big_data  http://d2i.indiana.edu/sites/default/files/1-s2.0-s1877050912001755-main.pdf  https://www.youtube.com/watch?v=Ft6yz0SObIA  https://www.youtube.com/watch?v=arVoQxjIxUU  PROGRAMMING E-SCIENCE GATEWAYS – ResearchGate  http://computer.howstuffworks.com/cloud-computing/cloud-computing.htm  http://www.slideshare.net/leelashine/hadoop-29386155?qid=cb3ea67b-c33e-4e2b- 8f79-fdca7b6a1b63&v=default&b=&from_search=5  http://en.wikipedia.org/wiki/Weka_(machine_learning)  http://en.wikipedia.org/wiki/Data_mining  http://en.wikipedia.org/wiki/Taxonomy  http://siliconangle.com/blog/2013/04/03/big-data-traffic-jam-smarter-lights- happy-drivers/ 43

Editor's Notes

  • #26 Explain well. Quote practical examples