Big Data: Its Characteristics And Architecture Capabilities
Upcoming SlideShare
Loading in...5
×
 

Big Data: Its Characteristics And Architecture Capabilities

on

  • 613 views

Big Data: Its Characteristics And Architecture Capabilities

Big Data: Its Characteristics And Architecture Capabilities

Statistics

Views

Total Views
613
Views on SlideShare
609
Embed Views
4

Actions

Likes
0
Downloads
24
Comments
0

2 Embeds 4

http://ashrafsau.blogspot.in 3
http://ashrafsau.blogspot.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data: Its Characteristics And Architecture Capabilities Big Data: Its Characteristics And Architecture Capabilities Presentation Transcript

  • Big Data: Its Characteristics And Architecture Capabilities By Ashraf Uddin South Asian University (http://ashrafsau.blogspot.in/)
  • What is Big Data? Big data refers to large datasets that are challenging to store, search, share, visualize, and analyze. “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
  • The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data
  • Do we really need Big Data? For consumer :  Better understanding of own behavior  Integration of activities  Influence – involvement and recognition For companies :  Real behavior-- what do people do, and what do they value?  Faster interaction  Better targeted offers  Customer understanding
  • Characteristics of Big Data 1. Volume (Scale) 2. Velocity (Speed) 3. Varity (Complexity)
  • Volume
  • Velocity • Data is being generated fast and need to be processed fast • Online Data Analytics • Late Decision leads missing opportunity
  • Varity • Various formats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data • To extract knowledge all these types of data need to linked together
  • Generation of Big Data Scientific instruments (collecting all sorts of data) Social media and networks (all of us are generating data) Sensor technology and networks (measuring all kinds of data)
  • Why Big Data is Different? For example, an airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time. Compare that with conventional high performance computing where New York Stock Exchange collects 1 terabyte of structured trading data per day. Conventional corporate structured data sized in terabytes and petabytes. Big Data is sized in peta-, exa-, and soon perhaps, zetta-bytes!
  • Why Big Data is Different? The unique characteristics of Big Data is the manner in which value is discovered. In conventional BI, the simple summing of a known value reveals a result In Big Data, the value is discovered through a refining modeling process: make a hypothesis create statistical, visual, or semantic models validate, then make a new hypothesis.
  • Use cases for Big Data Analytics
  • A Big Data Use Case: Personalized Insurance Premium an insurance company wants to offer to those who are unlikely to make a claim, thereby optimizing their profits. One way to approach this problem is to collect more detailed data about an individual's driving habits and then assess their risk. to collect data on driving habits utilizing sensors in their customers' cars to capture driving data, such as routes driven, miles driven, time of day, and braking abruptness.
  • A Big Data Use Case: Personalized Insurance Premium This data is used to assess driver risk; they compare individual driving patterns with other statistical information, such as average miles driven in same state, and peak hours of drivers on the road. Driver risk plus actuarial information is then correlated with policy and profile information to offer a competitive and more profitable rate for the company The result A personalized insurance plan. These unique capabilities, delivered from big data analytics, are revolutionizing the insurance industry.
  • A Big Data Use Case: Personalized Insurance Premium To accomplish this task: a great amount of continuous data must be collected, stored, and correlated. Hadoop is an excellent choice for acquisition and reduction of the automobile sensor data. Master data and certain reference data including customer profile information are likely to be stored in the existing DBMS systems a NoSQL database can be used to capture and store reference data that are more dynamic, diverse in formats, and change frequently.
  • Data Realm Characteristics
  • Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • Storage and Management Capability Hadoop (HDFS) Distributed File System  highly scalable storage and automatic data replication across three nodes for fault tolerance Cloudera Manager  gives a cluster-wide, real-time view of nodes and services running; provides a single, central place to enact configuration changes across the cluster
  • Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • Database Capability Oracle NoSQL  Dynamic and flexible schema design  High performance key value pair database. Apache HBase  Strictly consistent reads and writes  Allows random, real time read/write access Apache Cassandra  Fault tolerance capability is designed for every node  Data model offers column indexes with the performance of log-structured updates, materialized views, and built-in caching Apache Hive  Tools to enable easy data extract/transform/load (ETL)  Query execution via MapReduce
  • Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • Processing Capability MapReduce  Break problem up into smaller sub-problems  Able to distribute data workloads across thousands of nodes Apache Hadoop  Leading MapReduce implementation  Highly scalable parallel batch processing  Writes multiple copies across cluster for fault tolerance
  • Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • Data Integration Capability Exports MapReduce results Hadoop, and other targets to RDBMS, Connects Hadoop to relational databases for SQL processing Optimized processing import/export with parallel data
  • Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • Statistical Analysis Capability Programming analysis language for statistical Oracle R Enterprise allows reuse pre-existing R scripts with no modification of
  • Big Data Architecture Traditional Information Architecture Capability Big Data Information Architecture Capability
  • Conclusion Today’s economic environment demands that business be driven by useful, accurate, and timely information. the world of Big Data is a solution to the problem. there are always business and IT tradeoffs to get to data and information in a most cost-effective way.
  • References 1. Big Data Analytics Guide: Better technology, more insight for the next generation of business applications, SAP 2. Oracle Information Guide to Big Data Architecture: An Architect’s 3. http:// www.csc.com/insights/flxwd/78931-big_data_univers e_beginning_to_explode 4. http:// www.techrepublic.com/blog/big-data-analytics/10-em erging-technologies-for-big-data/280 5. http://www.idc.com/ 6. From Database to Big Data. Sam Madden (MIT)