Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data Architectural Series:
Creating a Next-Generation Big Data Architecture
facebook.com/perficient twitter.com/Perfic...
2
Perficient is a leading information technology consulting firm serving clients throughout
North America.
We help clients...
3
• Founded in 1997
• Public, NASDAQ: PRFT
• 2013 revenue $373 million
• Major market locations:
• Allentown, Atlanta, Bos...
BUSINESS SOLUTIONS
Business Intelligence
Business Process Management
Customer Experience and CRM
Enterprise Performance Ma...
Our Speaker
Bill Busch
Sr. Solutions Architect, Enterprise Information Solutions, Perficient
• Leads Perficient's enterpri...
Perficient’s Big Data Architectural Series
Business
Case
Next
Generation
Architecture
Future Topics
• Data Integration
• S...
Today’s Objectives
5
Architectural
Roles For
Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Arch...
Today’s Objectives
5
Architectural
Roles For
Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Arch...
“Big Data is high-volume, high-velocity and high-
variety information assets that demand cost-effective,
innovative forms ...
“Big Data is high-volume, high-velocity and high-
variety information assets that demand cost-
effective, innovative forms...
Common Big Data Business Use Cases
Improve Strategic
Decision Making
Customer
Experience
Analysis
Operational
Optimization...
Expanding Data Ecosystem
• Customer
Intelligence
• Operations
• Risk& Fraud
• Data
Monetization
• Strategic
Development
• ...
Enterprise Data Architecture
Next Generation
The Promise
Data Architecture Simplification
Data Integration
Data Hub
Analytics
Stream Processing
Data Warehouse
Operatio...
The Reality
Maturity Limits the Use Cases
• Realize the potential of Hadoop
• Multi-tenancy is in its infancy
• Hadoop 2.0...
Different Types of “Open Source Hadoop”
Apache
Projects
Only
Proprietary
Value Add & Re-
Development
Apache
Projects +
Pro...
Quick Primer on YARN
What is Yarn?
• Yet Another Resource Manager
• Sometimes referred as
MapReduce 2.0
• Data operating s...
Today’s Objectives
5
Architectural
Roles For
Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Arch...
Hadoop
Analytics
Data
Warehouse
Stream
Processing
Data Factory
Transactional
Data Store
Five Common Architectural Roles
Ha...
Enterprise Data Architecture
Next Generation
Hadoop
Analytics
Data
Warehouse
Stream
Processing
Data Factory
Transactional
Data Store
Five Common Architectural Roles
Ha...
Analytical Processing
Source Wrangle Data Model & Tune Operationalize1 2 3 4
• Data Ingestion
• Metadata
Management
• Data...
Analytical Processing
Source Wrangle Data Model & Tune Operationalize1 2 3 4
• Data Ingestion
• Metadata
Management
• Data...
Data Access
• There are many methods
to accessing Big Data
• Direct HDFS
• NoSQL / Connector
• Hive/ SQL On Hadoop
• Align...
Hadoop
Analytics
Data
Warehouse
Stream
Processing
Data Factory
Transactional
Data Store
Five Common Architectural Roles
Ha...
Data Warehouse Roles
• Two models for splitting
processing
• Hot – Cold
• Data Warehouse Layer
• Push high user loads to
t...
Data Warehouse
Organize Your Data
• Types of data stored on
cluster
• Analytical sandboxes
• Team
• Individual
• Quotas
• ...
Hadoop
Analytics
Data
Warehouse
Stream
Processing
Data Factory
Transactional
Data Store
Five Common Architectural Roles
Ha...
Stream and Event Processing
• Dedicated vs. Shared Model
• Persistence of messages, logs, etc.
• Long-term storage
• Queui...
Hadoop
Analytics
Data
Warehouse
Stream
Processing
Data Factory
Transactional
Data Store
Five Common Architectural Roles
Ha...
The Data Integration Challenge
Key Point: Hadoop and Hadoop-related technologies can address these challenges.
However, th...
Data Factory & Integration
Hadoop Distributed
Tools
Data Integration
Packages
Hybrid (Both Hadoop
and Data Integration
Pac...
Define Pipelines and Stages
Sqoop
Cloud
Sources
RDBMS
File
Hub
FTP
Packaged
Tool
Object
DBMS
ETL Tool
Log
Data
FTP
Stream/...
Big Data Integration Framework
Typical Services
Key Guidance:
• In lieu of using a ETL product, consider building a Big
Da...
Hadoop
Analytics
Data
Warehouse
Stream
Processing
Data Factory
Transactional
Data Store
Five Common Architectural Roles
Ha...
SQL on Hadoop
• SQL on Hadoop is changing
• Historically focused on read
functionality for analytics
• New breed of SQL on...
Transactions In Hive
Today’s Objectives
5
Architectural
Roles For
Hadoop
Hadoop
Ecosystem
Potential
vs. Reality
Realizing A
Hadoop
Centric
Arch...
Common Big Data Business Use Cases
Improve Strategic
Decision Making
Customer
Experience
Analysis
Operational
Optimization...
Architectural Scenarios
Architecture
Role
Business Use Case Analytics
Data
Warehouse
Stream
Processing Data Factory
Transa...
Integrating Hadoop into the Enterprise
Determine
Business Use
Cases
Understand
Current Tools
& Architecture
Align Business...
Final Thoughts
Do
• Match the business use case to the big data role
• Clearly define a roadmap
• Establish clear architec...
As a reminder, please submit your
questions in the chat box.
We will get to as many as possible.
Daily unique content
about content
management, user
experience, portals
and other enterprise
information technology
soluti...
Thank you for your participation today.
Please fill out the survey at the close of this session.
Upcoming SlideShare
Loading in …5
×

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02

102 views

Published on

test

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02

  1. 1. Big Data Architectural Series: Creating a Next-Generation Big Data Architecture facebook.com/perficient twitter.com/Perficientlinkedin.com/company/perficient
  2. 2. 2 Perficient is a leading information technology consulting firm serving clients throughout North America. We help clients implement business-driven technology solutions that integrate business processes, improve worker productivity, increase customer loyalty and create a more agile enterprise to better respond to new business opportunities. About Perficient
  3. 3. 3 • Founded in 1997 • Public, NASDAQ: PRFT • 2013 revenue $373 million • Major market locations: • Allentown, Atlanta, Boston, Charlotte, Chicago, Cincinnati, Columbus, Dallas, Denver, Detroit, Fairfax, Houston, Indianapolis, Lafayette, Minneapolis, New York City, Northern California, Oxford (UK), Philadelphia, Southern California, St. Louis, Toronto, Washington, D.C. • Global delivery centers in China and India • >2,200 colleagues • Dedicated solution practices • ~90% repeat business rate • Alliance partnerships with major technology vendors • Multiple vendor/industry technology and growth awards Perficient Profile
  4. 4. BUSINESS SOLUTIONS Business Intelligence Business Process Management Customer Experience and CRM Enterprise Performance Management Enterprise Resource Planning Experience Design (XD) Management Consulting TECHNOLOGY SOLUTIONS Business Integration/SOA Cloud Services Commerce Content Management Custom Application Development Education Information Management Mobile Platforms Platform Integration Portal & Social Our Solutions Expertise
  5. 5. Our Speaker Bill Busch Sr. Solutions Architect, Enterprise Information Solutions, Perficient • Leads Perficient's enterprise data practice • Specializes in business-enabling BI solutions that enable the agile enterprise • Responsible for executive data strategy, roadmap development, and the delivery of high-impact solutions that enable organizations to leverage enterprise data • Bill has over 15 years of experience in executive leadership, business intelligence, data warehousing, data governance, master data management, information/data architecture and analytics
  6. 6. Perficient’s Big Data Architectural Series Business Case Next Generation Architecture Future Topics • Data Integration • Stream Processing • NoSQL • SQL on Hadoop • Data Quality • Governance • Use Cases & Case Studies Today’s Webinar
  7. 7. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  8. 8. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  9. 9. “Big Data is high-volume, high-velocity and high- variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Convergence of structured, unstructured, and dark data Big Data is the evolution of data creating similar data management issues that IT has struggled to address for the last 20+ years. Three Views of Big Data
  10. 10. “Big Data is high-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.” Convergence of structured, unstructured, and dark data Big Data is the evolution of data creating similar data management issues that IT has struggled to address for the last 20+ years. Three Views of Big Data
  11. 11. Common Big Data Business Use Cases Improve Strategic Decision Making Customer Experience Analysis Operational Optimization Risk and Fraud Reduction Data Monetization Security Event Detection and Analysis IT Cost Management
  12. 12. Expanding Data Ecosystem • Customer Intelligence • Operations • Risk& Fraud • Data Monetization • Strategic Development • Security Intelligence • IT Optimization Structured Data (5-20% of Total) Point-of-Sale Text Messages Contracts & Regulatory Preferences & Emotions Security AccessWeather Machine Data Automobile Mobile Communications Geospatial Social Data Ecosystem
  13. 13. Enterprise Data Architecture Next Generation
  14. 14. The Promise Data Architecture Simplification Data Integration Data Hub Analytics Stream Processing Data Warehouse Operational Data Hadoop Cluster
  15. 15. The Reality Maturity Limits the Use Cases • Realize the potential of Hadoop • Multi-tenancy is in its infancy • Hadoop 2.0 and YARN • Most third-party applications are just moving to YARN • Hive (and other SQL on Hadoop solutions) maturing • Robust enterprise functionality is evolving • Security • High Availability
  16. 16. Different Types of “Open Source Hadoop” Apache Projects Only Proprietary Value Add & Re- Development Apache Projects + Proprietary Add-ons Packaged and Online Solutions • IBM Big Insights • Oracle Big Data Appliance • HDInsight • Many others! Choosing A Hadoop Distribution  Company Philosophy  Current Relationships  Acceptable Risk  Specialized Functionality
  17. 17. Quick Primer on YARN What is Yarn? • Yet Another Resource Manager • Sometimes referred as MapReduce 2.0 • Data operating system • Fault-Tolerance Why is this important? • Enables multi-tendency on Hadoop • Moves processing to the data *Image Provided by HortonWorks
  18. 18. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  19. 19. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  20. 20. Enterprise Data Architecture Next Generation
  21. 21. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  22. 22. Analytical Processing Source Wrangle Data Model & Tune Operationalize1 2 3 4 • Data Ingestion • Metadata Management • Data Access • Data Preparation Tools • Data Discovery &Visualization • Data Wrangling Tools • Business Glossary & Search • Data Access • Data Discovery & Visualization • Analytical Tools • Analytical Sandbox • Business Created Reporting • Model Execution & Management • Knowledge Management (Portal) Analytical Process Architectural Capabilities
  23. 23. Analytical Processing Source Wrangle Data Model & Tune Operationalize1 2 3 4 • Data Ingestion • Metadata Management • Data Access • Data Preparation Tools • Data Discovery &Visualization • Data Wrangling Tools • Business Glossary & Search • Data Access • Data Discovery & Visualization • Analytical Tools • Analytical Sandbox • Business Created Reporting • Model Execution & Management • Knowledge Management (Portal) Analytical Process Architectural Capabilities
  24. 24. Data Access • There are many methods to accessing Big Data • Direct HDFS • NoSQL / Connector • Hive/ SQL On Hadoop • Align tool to access methods and file types • Data Preparation • Analytics Source Files/Data Tidy Data Data Preparation Tool Analytics Tool Analytical Result Read Access Write Access Key Hadoop Cluster
  25. 25. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  26. 26. Data Warehouse Roles • Two models for splitting processing • Hot – Cold • Data Warehouse Layer • Push high user loads to traditional data warehouses • Fully investigate DW- Hadoop connector functionality • Leverage opportunity to use in-memory database solutions Data Warehouse Layer Approach Hadoop Cluster Traditional DW/DM Hot – Cold Data Warehouse Cold Data Hadoop Cluster Traditional DW/DM Hot Data
  27. 27. Data Warehouse Organize Your Data • Types of data stored on cluster • Analytical sandboxes • Team • Individual • Quotas • Potential to replace information lifecycle management solutions • No right answer – clearly define usage Consolidated Data Streaming Queues Delta’s (Incremental) Common Data (Dimensions, Master Data) Improved / Modeled Data Published, Analytical and Aggregates Sandbox Zone Raw Data Processed Data Hadoop Cluster Archived Data
  28. 28. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  29. 29. Stream and Event Processing • Dedicated vs. Shared Model • Persistence of messages, logs, etc. • Long-term storage • Queuing • Pre-load (HDFS) vs. Post-load processing • Micro-Batch vs. One-at-a-Time • Programing language support • Processing guarantee • At most once • At least once • Exactly once Let business requirements drive need for streaming solutions. It is acceptable to use more than one solution as long as the roles / purposes of each are clearly defined.
  30. 30. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  31. 31. The Data Integration Challenge Key Point: Hadoop and Hadoop-related technologies can address these challenges. However, they must be architected and governed properly Volume, variety, and velocity create unique challenges for data integration 10,000+ unique entities (or file groups) may have to be managed Batch windows are still the same or shrinking The Challenge
  32. 32. Data Factory & Integration Hadoop Distributed Tools Data Integration Packages Hybrid (Both Hadoop and Data Integration Package) • Leverages tools included in the Hadoop Distribution and programing languages • Scoop, Flume, Spark, Java, MapReduce are examples • Tools can be implemented in many different modes • Hand-coded/scripted • Runtime Configured • Generated • Based on use case leverages both Hadoop and COTs tools to move and transform data • Leverage commercial data integration packages to move and transform data • IBM Infosphere Big Insights, Informatica are examples • Key questions, where is processing taking place and does the tool use YARN resource manger? Approaches to Big Data Integration
  33. 33. Define Pipelines and Stages Sqoop Cloud Sources RDBMS File Hub FTP Packaged Tool Object DBMS ETL Tool Log Data FTP Stream/ Message Bus Kafta Sqoop Storm Extract HDFS Load & Formatting Scraping& Normalization MCF Storm Cleansing , Aggregation Transformation Package ETL Tool Storm Data Distribution Data Access & Distribution RDBMS/DW /IMDB Hive Hbase File Extracts NoSQL Stream Output Custom Sqoop Custom Custom Message Bus ETL Tool ETL Tool
  34. 34. Big Data Integration Framework Typical Services Key Guidance: • In lieu of using a ETL product, consider building a Big Data Integration framework • Apache Falcon provides pipeline management • Focus is on making all components run-time configurable with metadata • Can offer significant cost savings over the long run Load UtilityMetadata Collection Metadata Pipeline Config Files Metadata Config Files Pipeline Utilities Parser (Delimiter) Data Standardization HIVE Publishing MF Coding Converters File Joiner & Transport Logging Checksum Retention Replication Late Arriving Data Exception Handling Pipeline Master (ex. Falcon) DB Copy Archival Audit Sqoop Flume HDFS Shell
  35. 35. Hadoop Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store Five Common Architectural Roles Hadoop Big Data Use Cases
  36. 36. SQL on Hadoop • SQL on Hadoop is changing • Historically focused on read functionality for analytics • New breed of SQL on Hadoop • BI and operational reporting • Transaction Processing *Image Provided by Splice Machine
  37. 37. Transactions In Hive
  38. 38. Today’s Objectives 5 Architectural Roles For Hadoop Hadoop Ecosystem Potential vs. Reality Realizing A Hadoop Centric Architecture
  39. 39. Common Big Data Business Use Cases Improve Strategic Decision Making Customer Experience Analysis Operational Optimization Risk and Fraud Reduction Data Monetization Security Event Detection and Analysis IT Cost Management
  40. 40. Architectural Scenarios Architecture Role Business Use Case Analytics Data Warehouse Stream Processing Data Factory Transactional Data Store* Strategic Decision Making P s Customer Experience P s P s Operational Optimization P s s s Risk and Fraud Reduction P s P Data Monetization s s P Security Event Detection and Analysis P s s s IT Cost Management P s P P * Capability is just emerging within the Hadoop ecosystem. Consider this use case for isolated business cases and early adopters. P = Primary Use Case s = Secondary Use case
  41. 41. Integrating Hadoop into the Enterprise Determine Business Use Cases Understand Current Tools & Architecture Align Business Use Case Priorities Build Roadmap Specify Solution Architecture Update & Maintain Roadmap Implement Roadmap
  42. 42. Final Thoughts Do • Match the business use case to the big data role • Clearly define a roadmap • Establish clear architectural standards to drive • Consistency • Re-use of resources • Homework when defining a solution architecture Don’t • Select an initial use case that relies on immature Hadoop functionality • Leverage tools that move data off the cluster for processing then storing the data back on the cluster • Assume all Hadoop technologies integrate well together
  43. 43. As a reminder, please submit your questions in the chat box. We will get to as many as possible.
  44. 44. Daily unique content about content management, user experience, portals and other enterprise information technology solutions across a variety of industries. Perficient.com/SocialMedia Facebook.com/Perficient Twitter.com/Perficient
  45. 45. Thank you for your participation today. Please fill out the survey at the close of this session.

×