BIG DATA ANALYTICS
REFERENCE ARCHITECTURES AND
CASE STUDIES
BY SERHIY HAZIYEV AND OLHA HRYTSAY
Agenda
2
Big Data
Challenges
Big Data
Reference
Architectures
Case
Studies
10 tips for
Designing
Big Data
Solutions
Big Data Challenges
3
UNSTRUCTURED
STRUCTURED
HIGH
MEDIUM
LOW
Archives Docs Business
Apps
Media Social
Networks
Public
Web...
Big Data Analytics
4
Traditional Analytics (BI) Big Data Analytics
Focus on
Data Sets
Supports
• Descriptive analytics
• D...
Big Data Analytics Use Cases
5
Data
Discovery
Business
Reporting
Real Time
Intelligence
Data Quality
Self Service
Business...
Big Data Analytics Reference Architectures
6
Architecture Drivers: Reference Architectures:
▪ Volume
▪ Sources
▪ Throughpu...
Relational Reference Architecture
7
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
...
8
Extended Relational
Reference Architecture
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OL...
Non-Relational Reference Architecture
9
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
Map Red...
Extended Relational vs. Non-Relational Architecture
10
Architecture Drivers
Extended
Relational
Non-Relational
Large data ...
Extended Relational vs. Non-Relational Architecture
11
Architecture Drivers
Extended
Relational
Non-Relational
Large data ...
Relational vs. Non-Relational Architecture
12
Relational Non-Relational
• Rational
• Predictable
• Traditional
• Agile
• F...
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
13
Business Users
Intelligent Agents...
Data Discovery: Non-Relational Architecture
14
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
...
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
15
Intelligent AgentsConsumers
Data ...
Business Reporting: Hybrid Architecture
16
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Map Reduce
SQL Query &
...
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
17
Data Scientists Business Users
In...
Lambda Architecture
18
Source:
19
Business Goals:
 Provide development environment
for building custom mobile applications
 Charge customers for the pl...
Architectural Decisions
20
▪ Volume (> 10 TB)
▪ Sources (Semi-structured - JSON)
▪ Throughput (> 10K/sec)
▪ Latency (2 min...
Solution Architecture
21
Technologies:
• Amazon Redshift
• Amazon SQS
• Amazon S3
• Elastic Beanstalk
• Jaspersoft BI Prof...
22
Business Goals:
 Build in-house Analytics Platform for ROI measurement
and performance analysis of every product and f...
//
Extended
Relational
Non-
Relational
Volume/Scalability +/- +
Throughput + +
Self-Service + +/-
Extensibility - +
Archit...
Solution Architecture
24
Technologies:
• Amazon S3
• Flume
• Hadoop/HDFS, MapReduce
• HBase
• Oozie
• Hive
Node 1
Node 2
N...
10 Tips for Designing Big Data Solutions
25
 Understand data users and sources
 Discover architecture drivers
 Select p...
26
▪ Leading global Product and
Application Development partner
founded in 1993
▪ 3,300+ employees across North
America, U...
Thank You!
27
SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880
C...
Upcoming SlideShare
Loading in...5
×

SEI Carnegie-Mellon SATURN 2014 Conference: Big Data Analytics Reference Architectures - by Serhiy Haziyev and Olha Hrytsay

400

Published on

If data were equated to temperature in physical analogy, according to the second law of thermodynamics its accumulation would lead to the heat death of the universe. In fact, 90% of the world’s data has been generated during the last two years. Fortunately, business intelligence (BI) architectures and technologies are evolving rapidly to keep up with the exponential rise of data and project complexity. An almost 30-year era of dominance for the relational database management system (RDBMS) ended, and there is no one-size-fits-all solution that solves all of today’s Big Data challenges. BI architecture drivers must change to satisfy new requirements in format, volume, latency, hosting, analysis, reporting, and visualization. In this session, we expose a number of reference architectures that address these challenges and speed up the design and implementation process, making it more predictable and economical:

- Traditional architecture based on an RDMBS data warehouse but modernized with column-based storage to handle a high load and capacity
- NoSQL-based architectures that address Big Data batch and stream-based processing and use popular NoSQL and complex event-processing solutions
- Hybrid architecture that combines traditional and NoSQL approaches to achieve completeness that would not be possible with either alone

The architectures are accompanied by real-life projects and case studies that the presenters have performed for multiple companies, including Fortune 100 and start-ups.

Published in: Software, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
400
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Big Data – data that is too large complex and dynamic for any conventional data tools to capture, store, manage and analyze.
  • More details can be found here: link to our case study
  • SEI Carnegie-Mellon SATURN 2014 Conference: Big Data Analytics Reference Architectures - by Serhiy Haziyev and Olha Hrytsay

    1. 1. BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BY SERHIY HAZIYEV AND OLHA HRYTSAY
    2. 2. Agenda 2 Big Data Challenges Big Data Reference Architectures Case Studies 10 tips for Designing Big Data Solutions
    3. 3. Big Data Challenges 3 UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW Archives Docs Business Apps Media Social Networks Public Web Data Storages Machine Log Data Sensor Data Data Storages RDBMS, NoSQL, Hadoop, file systems etc. Machine Log Data Application logs, event logs, server data, CDRs, clickstream data etc. Sensor Data Smart electric meters, medical devices, car sensors, road cameras etc. Archives Scanned documents, statements, medical records, e-mails etc.. Docs XLS, PDF, CSV, HTML, JSON etc. Business Apps CRM, ERP systems, HR, project management etc. Social Networks Twitter, Facebook, Google+, LinkedIn etc. Public Web Wikipedia, news, weather, public finance etc Media Images, video, audio etc. Velocity Variety VolumeComplexity
    4. 4. Big Data Analytics 4 Traditional Analytics (BI) Big Data Analytics Focus on Data Sets Supports • Descriptive analytics • Diagnosis analytics • Limited data sets • Cleansed data • Simple models • Large scale data sets • More types of data • Raw data • Complex data models • Predictive analytics • Data Science Causation: what happened, and why? Correlation: new insight More accurate answers vs
    5. 5. Big Data Analytics Use Cases 5 Data Discovery Business Reporting Real Time Intelligence Data Quality Self Service Business Users Intelligent AgentsConsumers Low Latency Reliability Volume Performance Data Scientists/ Analysts
    6. 6. Big Data Analytics Reference Architectures 6 Architecture Drivers: Reference Architectures: ▪ Volume ▪ Sources ▪ Throughput ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost ▪ Extended Relational ▪ Non-Relational ▪ Hybrid
    7. 7. Relational Reference Architecture 7 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured
    8. 8. 8 Extended Relational Reference Architecture Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Key components affected with Big Data challenges
    9. 9. Non-Relational Reference Architecture 9 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Key components introduced with non-relational movement
    10. 10. Extended Relational vs. Non-Relational Architecture 10 Architecture Drivers Extended Relational Non-Relational Large data volume Self-service (ad-hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault-tolerance Low latency (near-real time) Low cost Skills availability
    11. 11. Extended Relational vs. Non-Relational Architecture 11 Architecture Drivers Extended Relational Non-Relational Large data volume Self-service (ad-hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault-tolerance Low latency (near-real time) Low cost Skills availability
    12. 12. Relational vs. Non-Relational Architecture 12 Relational Non-Relational • Rational • Predictable • Traditional • Agile • Flexible • Modern
    13. 13. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 13 Business Users Intelligent AgentsConsumers Performance Volume Data Scientists
    14. 14. Data Discovery: Non-Relational Architecture 14 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured
    15. 15. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 15 Intelligent AgentsConsumers Data Scientists Data Quality Self Service Business Users
    16. 16. Business Reporting: Hybrid Architecture 16 Web Services Mobile Devices Native Desktop Web Browsers Map Reduce SQL Query & Reporting Distributed File Systems API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Relational DWH/DM Advanced Analytics Search Engines Extended Relational components Non-relational components
    17. 17. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 17 Data Scientists Business Users Intelligent AgentsConsumers Low Latency Reliability
    18. 18. Lambda Architecture 18 Source:
    19. 19. 19 Business Goals:  Provide development environment for building custom mobile applications  Charge customers for the platform they use with pay-as-you-go model Business Area: Cloud based platform for building, deploying, hosting and managing mobile applications Case Study #1: Usage & Billing Analysis
    20. 20. Architectural Decisions 20 ▪ Volume (> 10 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 10K/sec) ▪ Latency (2 min) ▪ Extensibility (Custom metrics) ▪ Data Quality (Consistency) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self-Service (Ad-Hoc reports) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud) Architecture Drivers: Trade-off: // Extended Relational Non-Relational Extensibility - + Data Quality + - Self-Service + -  Extended Relational Architecture  Extensibility via Pre-allocated Fields pattern
    21. 21. Solution Architecture 21 Technologies: • Amazon Redshift • Amazon SQS • Amazon S3 • Elastic Beanstalk • Jaspersoft BI Professional • Python
    22. 22. 22 Business Goals:  Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform;  Provide the ability to understand how end-users are interacting with service content, products, and features on sites;  Do clickstream analysis;  Perform A/B Testing Business Area: Retail. A platform for e-commerce and collecting feedbacks from customers Case Study #2: Clickstream for retail website
    23. 23. // Extended Relational Non- Relational Volume/Scalability +/- + Throughput + + Self-Service + +/- Extensibility - + Architectural Decisions 23 ▪ Volume (45 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 20K/sec) ▪ Latency (1 hour) ▪ Extensibility (Custom tags) ▪ Data Quality (Not critical) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self-Service (Canned reports, Data science) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud) Architecture Drivers: Trade-off:  Non-Relational Architecture  Reporting via Materialized View pattern
    24. 24. Solution Architecture 24 Technologies: • Amazon S3 • Flume • Hadoop/HDFS, MapReduce • HBase • Oozie • Hive Node 1 Node 2 Node N
    25. 25. 10 Tips for Designing Big Data Solutions 25  Understand data users and sources  Discover architecture drivers  Select proper reference architecture  Do trade-off analysis, address cons  Map reference architecture to technology stack  Prototype, re-evaluate architecture  Estimate implementation efforts  Set up devops practices from the very beginning  Advance in solution development through “small wins”  Be ready for changes, big data technologies are evolving rapidly
    26. 26. 26 ▪ Leading global Product and Application Development partner founded in 1993 ▪ 3,300+ employees across North America, Ukraine and Western Europe ▪ Thousands of successful outsourcing projects! SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security Clients include:
    27. 27. Thank You! 27 SoftServe US Office One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880 Contacts Serhiy Haziyev: shaziyev@softserveinc.com Olha Hrytsay: ohrytsay@softserveinc.com

    ×