Information processing architectures

Information Processing
Architectures
Raji Gogulapati, Sep 2014

Information
Search
Information
Acquisition
Information
Processing
Information
Retention
Information
Maintenance
Information System
Management

Online
transaction
processing
(OLTP)
Information
Processing
Online
Analytical
Processing
(OLAP)
Complex
Event
Processing
(CPP)
Massively
Parallel
Processing
(MPP)
Legacy
Random

Infrastructure Essentials for Information Processing

Shared
Nothing
• OLAP
• BI, DW, Big Data
Shared Disk
• Traditional RDMS
• OLTP
Shared
Everything
• Traditional RDMS
• OLTP
Infrastructure Models of Databases

Database Architectures
Process
Disk
Process Process Process
Disk
Process
Shared Everything Shared Disk
Relational Data management systems for OLTP information

Process
Disk
Process
Disk
Process
Disk
Process
Disk
Master
Shared Nothing, Massively Parallel Architecture Layout
For Data Warehousing, Business Intelligence,
Big Data loads of information

Trade offs
Assigning tasks at proper time in the determined order
Batch and online scheduling
algorithms
Priority based, First come first served, Round Robin
Load balancing across nodes
Serializing data transfer
Data Transfer, computation delays
Data overflow, underflow
Reference: chapter3,
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.

Map Reduce Approach For Big Data Processing
Dynamic Job
scheduling
Scalable
Distributed Memory system
Fault Tolerant
Step 1 - Split Big data among multiple parallel map
data
Step 2 - Merge and Reduce data by grouping
Chapter 2,
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.
Key Value Pairs

Map Reduce Concept - Key Value Pairs
A B C
D D C
A D B
A B C
D D C
A D B
Input
A - 1
B - 1
C - 1
D – 1
D – 1
C - 1
A – 1
D – 1
B - 1
Map
A – 1
A – 1
B – 1
B - 1
C – 1
C - 1
D – 1
D – 1
D - 1
Shuffle/ Sort
A - 2
B – 2
C – 2
D – 3
Reduce
A – 2
B – 2
C - 2
D – 3
Output

Information Processing – Focus and Changes
Map Reduce Framework and Hadoop Distributed File system
• To perform analytics in parallel
• Map & Reduce Functions run in parallel Parallelism
• Share nothing
• Compute Nodes
Fault
Tolerance
• Scale CPU, memory. Robust data management
techniques to optimize data retrieval and storage.
• Assign data processing work load to that server
where the data is stored as per Map Reduce.
Scalability
Data Locality

ACID, BASE, CAP
Relational database management systems follow ACID rules – Atomicity, Consistency, Isolation, Durability
What to expect from Search – BASE
Yes, Search returns innumerable pages of data
Only one page is basically available - BA
Rest of the data is in Soft State - S Rest of the data becomes eventually
consistent - E
According to Database Theory – Distributed NoSQL big databases can satisfy only two of CAP and have to relax the
Expectations on the third.. CAP – Consistency, Availability, Partition Tolerance

Distributed Information management
C J Date’s Rules (12) for Distributed Databases
Location autonomy
No reliance on a central site for any particular service
Continuous operation
Location Independence
Fragmentation independence
Replication independence
Distributed query processing
Operating system independence
Hardware independence Distributed transaction management
DBMS independence Network Independence

Multiple Models For Data Architectures
Legacy, traditional RDBS Object oriented
Distributed Client Server
Data Warehouses
Parallel and Massively Parallel
Temporal
Partitioning Active Databases - Intelligence
Spatial Multimedia

Client Server Databases, Middleware - Drivers
1990’s
Remote Database Access (RDA)
Distributed Relational Database Architecture
Integrated Database Application Programming Interface (IDAPI)
Data Access Language (DAL)
Open Database Connectivity (ODBC)

Client Server basic model in the ‘80s
Adapted from figure 3.2 mid ‘80s client/ server environment, chapter 3, client server databases and middleware
Server applications
Interface Interface
Client PC
Request
Data

Data Warehouse – Applications
Non volatile
Time variant
Integrated
Subject
oriented

Data warehousing Models for analytical applications – pre-web
Star
Snowflake
Constellation

Data warehousing Models for analytical applications – complex web data
Use XML to model data warehouses
Combining OLAP tools with Data
mining
Rule based multi dimensional model

Next generation data warehouse
Analytics
Semantic interfaces/ Rules
engines, Hadoop/ NoSQL,
RDBMS
Data layer
OLTP, legacy data, web data

Source: http://www.sybase.com/files/White_Papers/TDWI_BPR_NextGenDWPlatforms_Q409.pdf

Business Intelligence – Models
DSS 2.0 architecture
Source: www.beyenetwork.com, http://www.b-eye-network.com/view/8385.

Multi tier distributed enterprise applications – Y2k period
Information system
tier
Client
tier
Presentation
(Web) Tier
Frameworks such as J2EE,
.Net
Database
Business
logic tier
Client server Application Server Database server

Mobile data progress
1 G 2G 2.5G 3G 4G
analog Digital
GSM GPRS EDGE WCDMA
Adapted from gsma.com,
Mena, Jesus. "Chapter 3 - Mobile Data". Data Mining Mobile Devices. Auerbach Publications, ©
2013

On going discussions and debates
Legacy Migrations Cloud environment – Suitability

Social, Mobile, Cloud environments for enterprise applications

Cloud Infrastructures for processing information
This topic is reserved for a more comprehensive coverage separately
In the context of Big data,
“ Bandey, D.(2012), Doctor of Law says "When a
Corporation mines the Big Data within its IT infrastructure
a number of laws will automatically be in play. However, if
That Corporation wants to analyze the same Big data in
the cloud-a new tier of legal obligations and restrictions
arise. Some of them quite foreign to a management
previously accustomed to dealing with its own data within
its own infrastructure“ “
Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 2 - Big Data Computing and the
Reference Architecture".Handbook of Research on Cloud Infrastructures for Big Data
Analytics. IGI Global. © 2014.

Topics for cloud and information processing
Several terms and topics in this area.
Cloud database systems Cloud Storage Data as a Service
Database as a service Data Models
Cloud computing demands five crucial characteristics for evaluating databases fit for cloud environment
On demand self service, broad network access, resource pooling, rapid elasticity and
Measured service.
Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 9 - Cloud Database Systems: NoSQL,
NewSQL, and Hybrid".Handbook of Research on Cloud Infrastructures for Big Data
Analytics. IGI Global. © 2014

Big Data Case Studies
Conversions – Traditional Main frame to Hadoop, NoSQL db
Recommendation Engine Video Streaming Analytics
Real Time Traffic monitoring
Social behaviors log processing

References:
Dow, K. E., Hackbarth, G., & Wong, J. (2013). Data architectures for an organizational memory information system.
Journal Of The American Society For Information Science & Technology, 64(7), 1345-1356. doi:10.1002/asi.22848
Chessell, Mandy & Smith, Harald C.. ( © 2013). Patterns of Information management.
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data
Management, Technologies, and Applications. IGI Global. ©
2014.
http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/
Krishnan, Krish. ( © 2013). Data warehousing in the age of big
data.
Alan R. Simon, Strategic Database Technology: Management for the year 2000.

Information processing architectures

More Related Content

What's hot

Viewers also liked

Similar to Information processing architectures

Recently uploaded

Information processing architectures