Information Processing 
Architectures 
Raji Gogulapati, Sep 2014
Information 
Search 
Information 
Acquisition 
Information 
Processing 
Information 
Retention 
Information 
Maintenance 
Information System 
Management
Online 
transaction 
processing 
(OLTP) 
Information 
Processing 
Online 
Analytical 
Processing 
(OLAP) 
Complex 
Event 
Processing 
(CPP) 
Massively 
Parallel 
Processing 
(MPP) 
Legacy 
Random
Infrastructure Essentials for Information Processing
Shared 
Nothing 
• OLAP 
• BI, DW, Big Data 
Shared Disk 
• Traditional RDMS 
• OLTP 
Shared 
Everything 
• Traditional RDMS 
• OLTP 
Infrastructure Models of Databases
Database Architectures 
Process 
Disk 
Process Process Process 
Disk 
Process 
Shared Everything Shared Disk 
Relational Data management systems for OLTP information
Process 
Disk 
Process 
Disk 
Process 
Disk 
Process 
Disk 
Master 
Shared Nothing, Massively Parallel Architecture Layout 
For Data Warehousing, Business Intelligence, 
Big Data loads of information
Trade offs 
Assigning tasks at proper time in the determined order 
Batch and online scheduling 
algorithms 
Priority based, First come first served, Round Robin 
Load balancing across nodes 
Serializing data transfer 
Data Transfer, computation delays 
Data overflow, underflow 
Reference: chapter3, 
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.
Map Reduce Approach For Big Data Processing 
Dynamic Job 
scheduling 
Scalable 
Distributed Memory system 
Fault Tolerant 
Step 1 - Split Big data among multiple parallel map 
data 
Step 2 - Merge and Reduce data by grouping 
Chapter 2, 
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014. 
Key Value Pairs
Map Reduce Concept - Key Value Pairs 
A B C 
D D C 
A D B 
A B C 
D D C 
A D B 
Input 
A - 1 
B - 1 
C - 1 
D – 1 
D – 1 
C - 1 
A – 1 
D – 1 
B - 1 
Map 
A – 1 
A – 1 
B – 1 
B - 1 
C – 1 
C - 1 
D – 1 
D – 1 
D - 1 
Shuffle/ Sort 
A - 2 
B – 2 
C – 2 
D – 3 
Reduce 
A – 2 
B – 2 
C - 2 
D – 3 
Output
Information Processing – Focus and Changes 
Map Reduce Framework and Hadoop Distributed File system 
• To perform analytics in parallel 
• Map & Reduce Functions run in parallel Parallelism 
• Share nothing 
• Compute Nodes 
Fault 
Tolerance 
• Scale CPU, memory. Robust data management 
techniques to optimize data retrieval and storage. 
• Assign data processing work load to that server 
where the data is stored as per Map Reduce. 
Scalability 
Data Locality
A Few Basics
ACID, BASE, CAP 
Relational database management systems follow ACID rules – Atomicity, Consistency, Isolation, Durability 
What to expect from Search – BASE 
Yes, Search returns innumerable pages of data 
Only one page is basically available - BA 
Rest of the data is in Soft State - S Rest of the data becomes eventually 
consistent - E 
According to Database Theory – Distributed NoSQL big databases can satisfy only two of CAP and have to relax the 
Expectations on the third.. CAP – Consistency, Availability, Partition Tolerance
Distributed Information management 
C J Date’s Rules (12) for Distributed Databases 
Location autonomy 
No reliance on a central site for any particular service 
Continuous operation 
Location Independence 
Fragmentation independence 
Replication independence 
Distributed query processing 
Operating system independence 
Hardware independence Distributed transaction management 
DBMS independence Network Independence
Multiple Models For Data Architectures 
Legacy, traditional RDBS Object oriented 
Distributed Client Server 
Data Warehouses 
Parallel and Massively Parallel 
Temporal 
Partitioning Active Databases - Intelligence 
Spatial Multimedia
Client Server Databases, Middleware - Drivers 
1990’s 
Remote Database Access (RDA) 
Distributed Relational Database Architecture 
Integrated Database Application Programming Interface (IDAPI) 
Data Access Language (DAL) 
Open Database Connectivity (ODBC)
Client Server basic model in the ‘80s 
Adapted from figure 3.2 mid ‘80s client/ server environment, chapter 3, client server databases and middleware 
Server applications 
Interface Interface 
Client PC 
Request 
Data
Data Warehouse – Applications 
Non volatile 
Time variant 
Integrated 
Subject 
oriented
Data warehousing Models for analytical applications – pre-web 
Star 
Snowflake 
Constellation
Data warehousing Models for analytical applications – complex web data 
Use XML to model data warehouses 
Combining OLAP tools with Data 
mining 
Rule based multi dimensional model
Next generation data warehouse 
Analytics 
Semantic interfaces/ Rules 
engines, Hadoop/ NoSQL, 
RDBMS 
Data layer 
OLTP, legacy data, web data
Source: http://www.sybase.com/files/White_Papers/TDWI_BPR_NextGenDWPlatforms_Q409.pdf
Business Intelligence – Models 
DSS 2.0 architecture 
Source: www.beyenetwork.com, http://www.b-eye-network.com/view/8385.
Multi tier distributed enterprise applications – Y2k period 
Information system 
tier 
Client 
tier 
Presentation 
(Web) Tier 
Frameworks such as J2EE, 
.Net 
Database 
Business 
logic tier 
Client server Application Server Database server
Mobile data progress 
1 G 2G 2.5G 3G 4G 
analog Digital 
GSM GPRS EDGE WCDMA 
Adapted from gsma.com, 
Mena, Jesus. "Chapter 3 - Mobile Data". Data Mining Mobile Devices. Auerbach Publications, © 
2013
On going discussions and debates 
Legacy Migrations Cloud environment – Suitability
Social, Mobile, Cloud environments for enterprise applications
Cloud Infrastructures for processing information 
This topic is reserved for a more comprehensive coverage separately 
In the context of Big data, 
“ Bandey, D.(2012), Doctor of Law says "When a 
Corporation mines the Big Data within its IT infrastructure 
a number of laws will automatically be in play. However, if 
That Corporation wants to analyze the same Big data in 
the cloud-a new tier of legal obligations and restrictions 
arise. Some of them quite foreign to a management 
previously accustomed to dealing with its own data within 
its own infrastructure“ “ 
Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 2 - Big Data Computing and the 
Reference Architecture".Handbook of Research on Cloud Infrastructures for Big Data 
Analytics. IGI Global. © 2014.
Topics for cloud and information processing 
Several terms and topics in this area. 
Cloud database systems Cloud Storage Data as a Service 
Database as a service Data Models 
Cloud computing demands five crucial characteristics for evaluating databases fit for cloud environment 
On demand self service, broad network access, resource pooling, rapid elasticity and 
Measured service. 
Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 9 - Cloud Database Systems: NoSQL, 
NewSQL, and Hybrid".Handbook of Research on Cloud Infrastructures for Big Data 
Analytics. IGI Global. © 2014
Big Data Case Studies 
Conversions – Traditional Main frame to Hadoop, NoSQL db 
Recommendation Engine Video Streaming Analytics 
Real Time Traffic monitoring 
Social behaviors log processing
References: 
Dow, K. E., Hackbarth, G., & Wong, J. (2013). Data architectures for an organizational memory information system. 
Journal Of The American Society For Information Science & Technology, 64(7), 1345-1356. doi:10.1002/asi.22848 
Chessell, Mandy & Smith, Harald C.. ( © 2013). Patterns of Information management. 
Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data 
Management, Technologies, and Applications. IGI Global. © 
2014. 
http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/ 
Krishnan, Krish. ( © 2013). Data warehousing in the age of big 
data. 
Alan R. Simon, Strategic Database Technology: Management for the year 2000.

Information processing architectures

  • 1.
    Information Processing Architectures Raji Gogulapati, Sep 2014
  • 2.
    Information Search Information Acquisition Information Processing Information Retention Information Maintenance Information System Management
  • 3.
    Online transaction processing (OLTP) Information Processing Online Analytical Processing (OLAP) Complex Event Processing (CPP) Massively Parallel Processing (MPP) Legacy Random
  • 4.
    Infrastructure Essentials forInformation Processing
  • 5.
    Shared Nothing •OLAP • BI, DW, Big Data Shared Disk • Traditional RDMS • OLTP Shared Everything • Traditional RDMS • OLTP Infrastructure Models of Databases
  • 6.
    Database Architectures Process Disk Process Process Process Disk Process Shared Everything Shared Disk Relational Data management systems for OLTP information
  • 7.
    Process Disk Process Disk Process Disk Process Disk Master Shared Nothing, Massively Parallel Architecture Layout For Data Warehousing, Business Intelligence, Big Data loads of information
  • 8.
    Trade offs Assigningtasks at proper time in the determined order Batch and online scheduling algorithms Priority based, First come first served, Round Robin Load balancing across nodes Serializing data transfer Data Transfer, computation delays Data overflow, underflow Reference: chapter3, Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014.
  • 9.
    Map Reduce ApproachFor Big Data Processing Dynamic Job scheduling Scalable Distributed Memory system Fault Tolerant Step 1 - Split Big data among multiple parallel map data Step 2 - Merge and Reduce data by grouping Chapter 2, Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014. Key Value Pairs
  • 10.
    Map Reduce Concept- Key Value Pairs A B C D D C A D B A B C D D C A D B Input A - 1 B - 1 C - 1 D – 1 D – 1 C - 1 A – 1 D – 1 B - 1 Map A – 1 A – 1 B – 1 B - 1 C – 1 C - 1 D – 1 D – 1 D - 1 Shuffle/ Sort A - 2 B – 2 C – 2 D – 3 Reduce A – 2 B – 2 C - 2 D – 3 Output
  • 11.
    Information Processing –Focus and Changes Map Reduce Framework and Hadoop Distributed File system • To perform analytics in parallel • Map & Reduce Functions run in parallel Parallelism • Share nothing • Compute Nodes Fault Tolerance • Scale CPU, memory. Robust data management techniques to optimize data retrieval and storage. • Assign data processing work load to that server where the data is stored as per Map Reduce. Scalability Data Locality
  • 12.
  • 13.
    ACID, BASE, CAP Relational database management systems follow ACID rules – Atomicity, Consistency, Isolation, Durability What to expect from Search – BASE Yes, Search returns innumerable pages of data Only one page is basically available - BA Rest of the data is in Soft State - S Rest of the data becomes eventually consistent - E According to Database Theory – Distributed NoSQL big databases can satisfy only two of CAP and have to relax the Expectations on the third.. CAP – Consistency, Availability, Partition Tolerance
  • 14.
    Distributed Information management C J Date’s Rules (12) for Distributed Databases Location autonomy No reliance on a central site for any particular service Continuous operation Location Independence Fragmentation independence Replication independence Distributed query processing Operating system independence Hardware independence Distributed transaction management DBMS independence Network Independence
  • 15.
    Multiple Models ForData Architectures Legacy, traditional RDBS Object oriented Distributed Client Server Data Warehouses Parallel and Massively Parallel Temporal Partitioning Active Databases - Intelligence Spatial Multimedia
  • 16.
    Client Server Databases,Middleware - Drivers 1990’s Remote Database Access (RDA) Distributed Relational Database Architecture Integrated Database Application Programming Interface (IDAPI) Data Access Language (DAL) Open Database Connectivity (ODBC)
  • 17.
    Client Server basicmodel in the ‘80s Adapted from figure 3.2 mid ‘80s client/ server environment, chapter 3, client server databases and middleware Server applications Interface Interface Client PC Request Data
  • 18.
    Data Warehouse –Applications Non volatile Time variant Integrated Subject oriented
  • 19.
    Data warehousing Modelsfor analytical applications – pre-web Star Snowflake Constellation
  • 20.
    Data warehousing Modelsfor analytical applications – complex web data Use XML to model data warehouses Combining OLAP tools with Data mining Rule based multi dimensional model
  • 21.
    Next generation datawarehouse Analytics Semantic interfaces/ Rules engines, Hadoop/ NoSQL, RDBMS Data layer OLTP, legacy data, web data
  • 22.
  • 23.
    Business Intelligence –Models DSS 2.0 architecture Source: www.beyenetwork.com, http://www.b-eye-network.com/view/8385.
  • 24.
    Multi tier distributedenterprise applications – Y2k period Information system tier Client tier Presentation (Web) Tier Frameworks such as J2EE, .Net Database Business logic tier Client server Application Server Database server
  • 26.
    Mobile data progress 1 G 2G 2.5G 3G 4G analog Digital GSM GPRS EDGE WCDMA Adapted from gsma.com, Mena, Jesus. "Chapter 3 - Mobile Data". Data Mining Mobile Devices. Auerbach Publications, © 2013
  • 27.
    On going discussionsand debates Legacy Migrations Cloud environment – Suitability
  • 28.
    Social, Mobile, Cloudenvironments for enterprise applications
  • 29.
    Cloud Infrastructures forprocessing information This topic is reserved for a more comprehensive coverage separately In the context of Big data, “ Bandey, D.(2012), Doctor of Law says "When a Corporation mines the Big Data within its IT infrastructure a number of laws will automatically be in play. However, if That Corporation wants to analyze the same Big data in the cloud-a new tier of legal obligations and restrictions arise. Some of them quite foreign to a management previously accustomed to dealing with its own data within its own infrastructure“ “ Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 2 - Big Data Computing and the Reference Architecture".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014.
  • 30.
    Topics for cloudand information processing Several terms and topics in this area. Cloud database systems Cloud Storage Data as a Service Database as a service Data Models Cloud computing demands five crucial characteristics for evaluating databases fit for cloud environment On demand self service, broad network access, resource pooling, rapid elasticity and Measured service. Raj, Pethuru, and Ganesh Chandra Deka (eds). "Chapter 9 - Cloud Database Systems: NoSQL, NewSQL, and Hybrid".Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global. © 2014
  • 31.
    Big Data CaseStudies Conversions – Traditional Main frame to Hadoop, NoSQL db Recommendation Engine Video Streaming Analytics Real Time Traffic monitoring Social behaviors log processing
  • 32.
    References: Dow, K.E., Hackbarth, G., & Wong, J. (2013). Data architectures for an organizational memory information system. Journal Of The American Society For Information Science & Technology, 64(7), 1345-1356. doi:10.1002/asi.22848 Chessell, Mandy & Smith, Harald C.. ( © 2013). Patterns of Information management. Hu, Wen-Chen, and Naima Kaabouch (eds). Big Data Management, Technologies, and Applications. IGI Global. © 2014. http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/ Krishnan, Krish. ( © 2013). Data warehousing in the age of big data. Alan R. Simon, Strategic Database Technology: Management for the year 2000.