Your SlideShare is downloading. ×
SoftServe BI/BigData Workshop in Utah
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SoftServe BI/BigData Workshop in Utah

265
views

Published on

The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture). …

The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session.

Published in: Technology, Business

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
265
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Split DW and BD
  • Split DW and BD
  • Split DW and BD
  • Transcript

    • 1. Common BI/Big Data Challenges and Solutions By Andriy Zabavskyy & Serhiy Haziyev January, 2013
    • 2. SoftServe BI/Big Data Lunch and Learn Workshop in Utah January 30, 2013 The Common BI/Big Data Challenges and Solutions presented by seasoned SoftServe experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture). This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session. About SoftServe Inc. SoftServe, founded in 1993, is a leading global outsourced product and application development company dedicated to empowering businesses worldwide by providing end-toend capabilities from product concept to completion. Utilizing Product Development Services 2.0 (PDS 2.0), we deliver proactive solutions in the areas of SaaS/Cloud, Mobility, BI/Analytics and UI/UX for industries including Healthcare, Retail, Manufacturing, Logistics, and Infrastructure & Storage. SoftServe is a rapidly growing global company with 3,000 professionals and offices in North America, Western Europe, Russia and Ukraine.
    • 3. Agenda Data Visualization Data Mining Big Data Data Integration Data Warehousing
    • 4. Typical BI Solution Data Sources Data Integration OLTP: CRM, ERP, Finance Data Warehouse Data Mining Users Predictive Prescriptive Analytics Data Warehouse OLAP cubes Data Visualization and Analysis Flat files ETL/ELT Big Data Reports Dashboards Spreadsheets Legacy System BI Tools Analysts
    • 5. Agenda Data Visualization Data Mining Big Data Data Integration Data Warehousing
    • 6. Dashboard & Scorecard Client Problem: ▪ Single view from multiple sources ▪ Track performance against company targets Internet Solution: ▪ Dashboard ▪ KPI and Scorecards Server Tier
    • 7. Dashboard & Scorecard: Implementation Software Vendors Offering Boxed solutions from big players Development Efforts Customization (e.g. SAS, SAP, IBI) Dashboard Frameworks (e.g. Tableau, QlikView, JasperSoft) Dashboard libs (JIDE libs) Custom defined KPI Integration Efforts Custom defined KPI & Custom built dashboard framework
    • 8. Dashboard & Scorecard: Highlights • Adopting/Customizing of business lines ready solution could be painful, long and costly process • Not all dashboard solutions support multitenancy out-of-the-box
    • 9. Self Service BI Problem: ▪ Give ability for BI users to explore and analyze data in highly customizable manner BI Users Data Model Solution: Toolset ▪ Expose to users a data model ▪ Give a toolset with data exploring and analysis capabilities OLAP In-Memory RDBMS/ NoSQL
    • 10. Self Service BI: Implementation • OLAP engines with proper OLAP viewers • BI tools with in-memory engines and semantic/domain layers • Report Authoring Tools : – Microsoft Report Builder – JasperServer Report Designer
    • 11. Self Service BI: Traditional vs Agile BI Trade-off Features Time to Value Self Service Collaboration Interactivity and UX Customization Data Quality Pixel-perfect Low cost solutions Traditional Agile
    • 12. Self Service BI: Highlights • Need to educate data consumers to properly use SSBI tools • Desktop versions of many SSBI vendors are often more mature in comparison to Web tools • In-memory capabilities are limited by RAM size
    • 13. Agenda Data Visualization Data Mining Big Data Data Integration Data Warehousing
    • 14. Data Integration Patterns Scheduled ETL ELT Replication EAI EII Real-time Message/Record based Large data sets Source: Microsoft EDW Architecture, Guidance and Deployment Best Practices
    • 15. ELT Problem: • Efficiently processing very large volumes of data within ever shortening processing windows Solution: • Perform transformation steps on target platform • Set-based processing Data Warehouse Semantic Layer Load Staging Layer Transform Source Source Extract
    • 16. ELT: Highlights • Some data integration platforms have clearly separated ETL and ELT components • Consider usage of custom scripts native to target platform vs. built-in DI component
    • 17. ETL vs. ELT ETL Flow Advantages Disadvantages ELT  Data pipeline are used  Transformations to the data one record at a time  Intermediate data results are stored in memory  Data is loaded into the destination server  Set-based processing  Transformations and Lookups are within the SQL  Complex transformations  Intermediate results in memory is faster than persisting to disk  The power of the relational database system can be utilized for very large data sets  Large data sets could  Load on RDBMS overwhelm the memory  More disk activity  Updates are more efficient using set-based processing
    • 18. Agenda Data Visualization Data Mining Big Data Data Integration Data Warehousing
    • 19. Kimball’s Multidimensional EDW Problem: • Integrate and consolidate data from heterogeneous sources • Keep data history Data Warehouse Solution: • Use multidimensional model to store data • Iterate by business lines • Integrate by conformed dimensions Data Sources
    • 20. Kimball vs. Inmon Sources Data Integration and Data Warehousing 3NF Inmon Approach Kimball Approach Visualization
    • 21. Kimball vs. Inmon Inmon Kimball Overall Approach Top-down Bottom-up Data orientation Subject- or data driven Process oriented Data Modeling Traditional Multidimensional Primary Audience IT professionals End users
    • 22. DWH: Implementation • Trasitional RDBMS • Analytical Column-based RDBMS
    • 23. DWH: Highlights Implications of column-based storage: – Additional columns vs. Junked dimensions – Update scenarios should be omitted where possible – Partitions scenario should be carefully established to support maintenance activities
    • 24. Agenda Data Visualization Data Mining Big Data Data Integration Data Warehousing
    • 25. Big Data Big Data axis
    • 26. Big Data: Hybrid Approach Problem: • Under big data circumstances: – Flexible online analytics – Access to most detailed raw data Operational and Historical Analytics Solution: • Analytical RDBMS for online analytics • NoSQL DB as source for RDBMS and most detailed row data NoSQL RDBMS/DW Source
    • 27. Big Data: Implementation Sample of Hybrid Approach in HP Operational Analytics Architecture
    • 28. Tape Library HDFS Disk Array Throughput (600 GB load time) 140-500 MB/s (0.3-1.2 h) 10-30 MB/s (5.5-16 h) 50-700 MB/s (0.25-4 h) 2-40 MB/s (83h) Max capacity 30-900 PB 21+ PB 16 PB ~Unlimited Max file size ~Unlimited ~Unlimited 4 – 16 TB (OSlimited) Accessibility SAN Java API, HTTP, NFS (MapR) NFS, CIFS, SAN REST, SOAP Scalability Adding cartridges Adding nodes Adding disks Pay-as-you-go Reliability Redundancy Redundancy (MapR) Redundancy 99.99% Encryption Yes Yes* Yes* Yes By datacenter By datacenter By datacenter By Amazon ? No Yes Yes Yes No No Yes Yes** 100 TB Cost $40-60K $100-200K $80-400K $132-216K/year $12-96K/year 1 PB Cost $90-140K $1-2M $0.5-4M $1.1-1.6M/year $120-360K/year 15 PB Cost $0.7-1.2M $15-30M ~$18M $9.9-15M/year $1.8-3.5M/year HIPAA Compliancy Random access Parallel processing Retention Storage Requirements Operation Storage Big Data isn’t only Hadoop Amazon S3 Amazon Glacier 5 TB 40 TB No
    • 29. Big Data: Highlights • Clickstream analysis is a classic use case • Scheduled reports are well suited for Hadoop based reports • Majority of Self Service BI tools need relational representation of data
    • 30. Agenda Data Visualization Data Mining Big Data Data Integration Data Warehousing
    • 31. Prediction of Customer Loyalty Problem: Prediction • Predict customer loyalty; profitability Solution: • Logistic regression algorithm • Support vector machines DM Tool Historical Data Algorithm
    • 32. Recommendation System Problem: Recommendation • Recommend to customers the most suitable goods Solution: DM Tool • K-means clustering algorithm • Collaborative filtering Historical Data Algorithm
    • 33. DM Models: Implementation • Custom algorithm implementation • Statistical packages like R • Ready data mining model implementations
    • 34. DM Models: Highlights • The approach should be: Problem -> Data Strategy -> Data analysis … and not vice versa • DM Algorithms should be carefully selected • DM Algorithms are highly dependent on business domain you create them for
    • 35. SoftServe BI Maturity Model • Improving the business Wisdom • decision making (executives) • data mining, forecasting • Gaining business insight Knowledge • analytical reports (analysts) • dashboards, KPIs, scorecards, slice & dice, data warehouse, OLAP • Measuring and monitoring Information • consolidated reports (managers) • charts, parametrized reports, dedicated reporting database • Running the business Data • personal operational reports (workers, customers) • simple reports, OLTP or files
    • 36. SoftServe BI/BigData Expertise Big Data and NoSQL Data Integration Data Warehouse BI Platforms
    • 37. More Info about SoftServe BI Offerings  http://www.softserveinc.com/en-us/services/software-architecture/  http://www.softserveinc.com/en-us/services/bi-analytics/