Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
JANET DORENKOTT, BIO 
• Over 20 years of experience in information technology. 
• Founded Relational Solutions in 1996 and co-owns with Rob York. 
• Focused on data warehousing, data integration & business intelligence solutions 
• Specialize in the complex issues associated with integrating point of sale and syndicated data 
for the CPG industry & developed applications including POSmart and BlueSky, designed for 
handling data complexities unique to CPG companies. 
• Member of Retailwire’s Braintrust 
• Founder of the Demand Signal Repository Institute on LinkedIn. 
• Participated in the implementation of over 200 data warehouse and BI projects for companies 
that include Chrysler, Chase, Timken, Xerox, Glaxo, Smuckers, P&G and many others. 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
GOALS FOR TODAY 
• TO DEFINE BIG DATA 
• EXPLAIN HOW BIG DATA CAN IMPROVE BUSINESS 
• EXPLAIN HOW TO USE IT 
• SHOW THE IMPORTANCE OF LEVERAGING SOCIAL MEDIA
“Top 10 
“Companies 
on the Move” 
BlueSky 
Integration 
Studio 
“Best at 
integrating POS 
with Internal 
data” 
Cleveland 
Weatherhead 100 
Fastest Growing 
Businesses 
Oracle 
Developer of 
the Year 
Data Warehouse 
& BI Consulting 
1996 - 98 1999 - 01 2002 – 04 2005 - 06 2007 - 08 2009 - 10 2011 – 12 2013 
“Data Warehouse 
of the Year!” 
BlueSky 
“Coolest New 
Technologies” 
DataStage 
ETL Best 
Implementors 
Award 
Informatica’s 
Partner of the 
Year 
Selects BIS to 
integrate POS & 
TradeEdge 
Selects 
POSmart to 
embed in DSR 
Best Software” 
Finalist 
BIG DATA… IT’S IN OUR BLOOD! 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
BUSINESS INTELLIGENCE 
• Leverages data to provide users with “Fact Based Decision” capability. 
• Derived from an enterprise data warehouse for management decisions 
• Reports are also derived from “stove pipe” solutions, ERP applications and homemade 
integration processes. 
• Operational reports are not the same as Analytical reports. 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
TRANSACTIONAL VS. ANALYTICAL REPORTING 
TRANSACTIONAL SYSTEM 
• DATABASE STRUCTURE DESIGNED FOR 
DATA ENTRY, UPDATE, AND PROCESSING. 
• OPERATIONAL REPORTS. 
• REPORTING USERS CAN IMPACT 
PROCESSING - QUICKLY BECOMES A SLOW 
ENVIRONMENT 
• PURCHASED APPLICATIONS CONTAIN 
STANDARD REPORTS 
• INCONSISTENT DUE TO “TWINKLING” 
• NO ACCESS TO SOME INFO 
• REPORTS CAN TAKE DAYS OR BE 
IMPOSSIBLE TO GET 
• NORMALIZED MODEL FOR FAST INPUT 
DATA WAREHOUSE 
• DATA MODEL DESIGNED FOR ANALYTICAL 
REPORTING AND AD-HOC QUERIES, BOTH 
FROM A CREATION AND A PERFORMANCE 
STANDPOINT 
• FREQUENTLY CONTAINS DETAIL DATA AND 
PRE-AGGREGATED SUMMARIES FOR FAST 
REPORTING 
• TOOLS ALLOW END USERS TO INQUIRE, 
DRILL FROM SUMMARY TO DETAIL 
• REPORTING USERS DO NOT IMPACT THE 
TRANSACTIONAL SYSTEM 
• OFTEN COMBINES DATA FROM MULTIPLE 
TRANSACTIONAL SYSTEMS 
• CONSISTENT – BUSINESS RULES 
• TYPICALLY DENORMALIZED 
Data 
Mart 
Transactional 
System 
e.g. 
SAP 
JDE 
Oracle Apps 
JDA 
Homegrown 
Data 
Mart 
Data 
Mart 
Data 
Mart 
Data 
Mart 
Data 
Mart 
Periodic Data Feeds 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
BIG DATA STARTED WITH ERP AND DATA WAREHOUSING 
• DATA MART: FOCUSED 
COLLECTION OF SIMILAR DATA 
FOR REPORTING PURPOSES 
Sales 
Data Mart 
Finance 
Data Mart 
Forecasting 
Data Mart 
International Sales 
Data Mart 
Vendor Information 
Data Mart 
 DATA WAREHOUSE: 
INTEGRATION OF MULTIPLE 
DATA MARTS INTO AN 
ENTERPRISE SOLUTION 
Marketing 
Data Mart 
Common 
Reference 
Values 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
THE BIG DATA EXPLOSION! 
Accounting 
Shipments 
Order 
Processing 
Manufacturing 
Transactional/ERP 
Analytical 
Big Data 
Currency Conversion 
Weather Trends 
SMS/MSS 
Photo’s 
Syndicated Data 
Web & Outside Data Sources 
EDW 
CRM 
Loyalty 
Segmentation 
Panel Data 
Wholesaler, Distributor 
& Broker Data 
Promotion Results 
Web Logs 
EDI 
Retailer POS Web Logs 
3rd Party Data 
Click Stream 
Audio 
Textual Content 
Video 
Reputation 
Management 
Social Media 
Chatter 
Blogs 
Location Info 
3-D Content 
Schmatics 
Geo-Spacial 
Speech to 
Text 
Demographics 
Emerging Market 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
WHAT’S THE DIFFERENCE? 
Un-Structured 
• Social Media 
• Chatter, Text 
Analytics, Blogs, 
Tweets, Comments, 
Likes, Followers, 
Social Authority, 
Clicks, Tags, etc. 
• Digital, Video 
• Audio 
• Geo-Spacial 
Multi-Structured 
/Hybrid 
• Emerging Market Data 
• Loyalty 
• E-Commerce 
• Other Third Party Data 
• Weather 
• Currency Conversion 
• Demographic 
• Panel 
• POS, POL, IR, EDI, RFID, NFC, QR, 
IRI, Rsi, Nielsen, Other 
Syndicated, IMS, MSA, etc. 
Structured 
ERP & DW 
• Main Frame 
• SQL Server 
• Oracle 
• DB2 
• Sybase 
• Access, Excel, txt, etc 
• Teradata 
• Neteeza, Other mpp 
• SAP, JDE, JDA, Other ERP. 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
VOLUME! 
0 
0 
0 
0 
0 
0 
0 
0 0 
0 
0 
0 
0 0 
0 
1 
0 1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
IT’S NOT JUST SIZE , 
VARIETY! 
EDI 
RFID 
SAP 
DB2 
Oracle 
TXT 
SQL 
AS2 
CRM 
TPO JDE 
QR 
ACESS 
Mobile 
EXCEL 
NPD 
IMS 
TPM 
E-Comerce 
CRM 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
IT’S NOT JUST VOLUME & VARIETY! 
VELOCITY MATTERS! 
• Daily 
• Weekly 
• Monthly 
• Quarterly 
• Annually 
• Every Hour 
• Every Minute 
• Every Second 
• Every Nano-Second! 
• Constantly Changing 
• Constantly Growning! 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
IT’S NOT JUST VOLUME & VARIETY & VELOCITY. 
COMPLEXITY! 
• Aligning Hierarchy’s 
• Integrating Internal Master Data with Retailer Master Data 
• Applying Various Calendars 
• Regional Territories 
• Geographic alignment 
• Currency Conversion 
• Emerging Market 
• Loyalty 
• Market Basket 
• Cleansing Issues 
• Re-cast Data 
• Slowly Changing Dimensions (how you want to handle 
history, new stores, etc). 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
WHAT IS HADOOP? 
•HADOOP IS AN OPEN SOURCE DATA LIBRARY WITH 2 KEY COMPONENTS: 
1. DISTRIBUTED FILE SYSTEM (HDFS) – FOR HIGH BANDWIDTH, CLUSTER BASED STORAGE 
2. DATA PROCESSING FRAMEWORK – USES “MAPREDUCE” TO DISTRIBUTE/MAP LARGE DATA SETS ACROSS 
MULTIPLE SERVERS. EACH SERVER CREATES A SUMMARY OF THE DATA THAT HAS BEEN ALLOCATED TO IT. FROM 
THERE, DATA IS “REDUCED” OR “AGGREGATED.” SIMPLY PUT, IT IS MAPPED, THEN REDUCED. 
“HADOOP LETS YOU DEAL WITH VOLUME, VELOCITY AND VARIETY OF DATA. IT TRANSFORMS COMMODITY 
HARDWARE AND PROVIDES AUTOMATIC FAILOVER.” 
OWEN O’MALLEY, ARCHITECT FOR MAPREDUCE & SECURITY. 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
WHAT IS MAPREDUCE? 
• A PARALLEL PROGRAMMING FRAMEWORK 
• MADE POPULAR BY GOOGLE 
• GENERATE SEARCH INDEXES 
• WEB SCORING ALGORITHMS 
• C++, JAVA, PYTHON, ETC. 
• HARNESS 1000S OF CPUS 
• MAPREDUCE PROVIDES 
• AUTOMATIC PARALLELIZATION 
• FAULT TOLERANCE 
• MONITORING & STATUS UPDATES 
“MAPREDUCE ALLOWS PROGRAMMERS 
WITHOUT ANY EXPERIENCE WITH PARALLEL 
AND DISTRIBUTED SYSTEMS TO EASILY 
UTILIZE THE RESOURCES OF A LARGE 
DISTRIBUTED SYSTEM.” 
- JEFFREY DEAN AND SANJAY GHEMAWAT, 
GOOGLE, INC., 2004 
Map Function 
Scheduler 
Results 
map 
shuffle 
reduce 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
MAPREDUCE IS SIMPLE WORD COUNT 
Unstructured 
Data Input 
Boat Yacht Lake 
House House Lake 
Boat House Yacht 
Fish Fish Fish 
Splitting Mapping Shuffling Reducing Result 
Boat Yacht Lake 
House House Lake 
Boat House Yacht 
Fish Fish Fish 
Boat, 1 
Yacht, 1 
Lake, 1 
House, 1 
House, 1 
Lake, 1 
Boat, 1 
House, 1 
Yacht, 1 
Fish, 1 
Fish, 1 
Fish, 1 
Boat, 1 
Boat, 1 
Yacht, 1 
Yacht, 1 
Lake, 1 
Lake, 1 
House, 1 
House, 1 
House, 1 
Fish, 1 
Fish, 1 
Fish, 1 
Boat, 2 
Yacht, 2 
Lake, 2 
House, 3 
Fish, 3 
Boat, 2 
Yacht, 2 
Lake, 2 
House, 3 
Fish, 3 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
COMMON TERMINOLOGY 
• PIG – HIGH LEVEL LANGUAGE THAT CONVERTS WORK TO MAPREDUCE 
• HIVE – TRANSFORMS & CONVERTS TO MAPREDUCE USING SQL 
• HBASE – SCALABLE, DISTRIBUTED DATABASE. PROVIDES A SIMPLE INTERFACE TO 
DATA (I.E. FACEBOOK MESSAGES UTILIZE THIS) 
• ZOOKEEPER – PROVIDES COORDINATION FOR SERVERS 
• HCATALOG – METADATA PULLED OUT OF HIVE 
• MAHOUT – MACHINE LEARNING LIBRARY 
• SCOOP – TOOL TO RUN MAPREDUCE APPS THAT PULL OR PUSH OUT OF SQL OR 
ORACLE 
• CASCADE – TRANSLATES DOWN INTO MAPREDUCE 
• OOZIE – WORKFLOW COORDINATION TO LEARN MAPREDUCE JOBS 
• FUSE DFS – USED TO ACCESS LINUX FILES 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
HOW CAN BIG DATA BE USED? 
• BIG DATA CAN BE USED TO MICRO-SEGMENT 
CUSTOMERS, ANALYZE SENTIMENT, PREDICT 
BEHAVIOR, PERSONALIZE OFFERS, CROSS-SELL 
AND UPSELL ACROSS CHANNELS, MANAGE 
REPUTATION, INCREASE SALE AND PROFITS. 
• COMPANIES NEED TO “WALK BEFORE YOU RUN.” 
• THE “BUILD IT & THEY WILL COME” PHILOSOPHY 
RARELY WORKS. IDENTIFY A BUSINESS NEED. 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
SOCIAL MEDIA REQUIRES YOU TO 
LISTEN 
ENGAGE 
INFORM 
OFFER 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
LEVERAGING THE DATA MEANS YOU NEED TO 
ACCESS 
ANALYZE 
ACT 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
IS SOCIAL MEDIA REALLY WORTH 
LEVERAGING? 
ACCORDING TO THE PEW RESEARCH CENTER: 
• 100 MILLION ACTIVE USERS 
• 50 MILLION LOG ON TO TWITTER EVERYDAY 
• 55% ARE MOBILE USERS 
------------------------------------------- 
• AVERAGE TWEETS SENT PER DAY (IN MILLIONS): 
• IN JANUARY, 2010 – 50 MILLION TWEETS PER SECOND 
• IN FEBRUARY, 2011 – 140 MILLION TWEETS PER SECOND 
• IN SEPTEMBER, 2011 – 230 MILLION TWEETS PER SECOND 
• There were 2.5 million tweets regarding Steve Jobs’ 
death in the first 13 hours after it was reported, which is 
about 53 tweets per second. 
• 6,939 Tweets per second in Japan on New Years Eve at 
Midnight 
According to McKinsey Global Institute: 
• Facebook – 700,000,000,000 minutes spent/month 
• Google – 34,000 search/sec 
• Email – 838,000,000 messages in 2013 
• Twitter – 500,000,000 tweets/day 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
IT’S ONLY JUST BEGUN! 
• LINKEDIN 
• FACEBOOK 
• YOUTUBE 
• SLIDESHARE 
• BRIGHTTALK.COM 
• SCRIBED 
• NAYMZ 
• JIGSAW 
• SPOKE 
• G+ 
• TWITTER 
• VINE 
• INSTAGRAM 
• BING 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
UNDERSTAND YOUR INTERACTIONS 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
KNOW YOUR SOCIAL REPUTATION 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
KNOW WHERE YOUR SENTIMENT IS COMING FROM 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
SEE WHERE YOUR CHAMPIONS ARE 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
UNDERSTAND WHERE YOU NEED DAMAGE CONTROL 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
WHAT ARE YOUR FOLLOWERS SAYING 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
GOALS FOR TODAY – ACCOMPLISHED! 
• TO DEFINE BIG DATA – VOLUME, VARIETY, VELOCITY & COMPLEXITY 
• EXPLAIN HOW BIG DATA CAN IMPROVE BUSINESS – LISTEN, ENGAGE, INFORM & OFFER 
• EXPLAIN HOW TO USE IT – LEVERAGING A FOUNDATION 
• SHOW THE IMPORTANCE OF LEVERAGING SOCIAL MEDIA – INTEGRATE WITH OTHER DATA
THANK YOU & STAY TUNED! 
• FOLLOW JANET DORENKOTT ON LINKEDIN, EMAIL JANETD@RELATIONALSOLUTIONS.COM 
• CALL US AT 440-899-3296, JANET IS X225 / KAREN IS X 232 
• FOLLOW RELATIONAL SOLUTIONS ON LINKEDIN, TWITTER @POSMARTBLUESKY & ON 
FACEBOOK 
• JOIN OUR “DEMAND SIGNAL REPOSITORY INSTITUTE” & “BIG DATA ASSOCIATION” GROUP ON 
LINKEDIN 
• SUBSCRIBE TO THE RELATIONAL SOLUTIONS CHANNEL ON YOUTUBE: 
• RELATIONAL SOLUTIONS CHANNEL 
• VISIT US AT WWW.RELATIONALSOLUTIONS.COM OR CALL 440-899-3296 X225 
• LEARN MORE FROM OUR WEBINARS & DOWNLOAD OUR WHITEPAPERS 
• SEE PRODUCT DEMO’S & DOWNLOAD TRIALS FROM OUR WEBSITE 
Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,

Big data why big data is huge for CPG manufacturers

  • 1.
    Property of RelationalSolutions, Inc. By Janet Dorenkott June, 2013,
  • 2.
    JANET DORENKOTT, BIO • Over 20 years of experience in information technology. • Founded Relational Solutions in 1996 and co-owns with Rob York. • Focused on data warehousing, data integration & business intelligence solutions • Specialize in the complex issues associated with integrating point of sale and syndicated data for the CPG industry & developed applications including POSmart and BlueSky, designed for handling data complexities unique to CPG companies. • Member of Retailwire’s Braintrust • Founder of the Demand Signal Repository Institute on LinkedIn. • Participated in the implementation of over 200 data warehouse and BI projects for companies that include Chrysler, Chase, Timken, Xerox, Glaxo, Smuckers, P&G and many others. Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 3.
    GOALS FOR TODAY • TO DEFINE BIG DATA • EXPLAIN HOW BIG DATA CAN IMPROVE BUSINESS • EXPLAIN HOW TO USE IT • SHOW THE IMPORTANCE OF LEVERAGING SOCIAL MEDIA
  • 4.
    “Top 10 “Companies on the Move” BlueSky Integration Studio “Best at integrating POS with Internal data” Cleveland Weatherhead 100 Fastest Growing Businesses Oracle Developer of the Year Data Warehouse & BI Consulting 1996 - 98 1999 - 01 2002 – 04 2005 - 06 2007 - 08 2009 - 10 2011 – 12 2013 “Data Warehouse of the Year!” BlueSky “Coolest New Technologies” DataStage ETL Best Implementors Award Informatica’s Partner of the Year Selects BIS to integrate POS & TradeEdge Selects POSmart to embed in DSR Best Software” Finalist BIG DATA… IT’S IN OUR BLOOD! Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 5.
    BUSINESS INTELLIGENCE •Leverages data to provide users with “Fact Based Decision” capability. • Derived from an enterprise data warehouse for management decisions • Reports are also derived from “stove pipe” solutions, ERP applications and homemade integration processes. • Operational reports are not the same as Analytical reports. Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 6.
    TRANSACTIONAL VS. ANALYTICALREPORTING TRANSACTIONAL SYSTEM • DATABASE STRUCTURE DESIGNED FOR DATA ENTRY, UPDATE, AND PROCESSING. • OPERATIONAL REPORTS. • REPORTING USERS CAN IMPACT PROCESSING - QUICKLY BECOMES A SLOW ENVIRONMENT • PURCHASED APPLICATIONS CONTAIN STANDARD REPORTS • INCONSISTENT DUE TO “TWINKLING” • NO ACCESS TO SOME INFO • REPORTS CAN TAKE DAYS OR BE IMPOSSIBLE TO GET • NORMALIZED MODEL FOR FAST INPUT DATA WAREHOUSE • DATA MODEL DESIGNED FOR ANALYTICAL REPORTING AND AD-HOC QUERIES, BOTH FROM A CREATION AND A PERFORMANCE STANDPOINT • FREQUENTLY CONTAINS DETAIL DATA AND PRE-AGGREGATED SUMMARIES FOR FAST REPORTING • TOOLS ALLOW END USERS TO INQUIRE, DRILL FROM SUMMARY TO DETAIL • REPORTING USERS DO NOT IMPACT THE TRANSACTIONAL SYSTEM • OFTEN COMBINES DATA FROM MULTIPLE TRANSACTIONAL SYSTEMS • CONSISTENT – BUSINESS RULES • TYPICALLY DENORMALIZED Data Mart Transactional System e.g. SAP JDE Oracle Apps JDA Homegrown Data Mart Data Mart Data Mart Data Mart Data Mart Periodic Data Feeds Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 7.
    BIG DATA STARTEDWITH ERP AND DATA WAREHOUSING • DATA MART: FOCUSED COLLECTION OF SIMILAR DATA FOR REPORTING PURPOSES Sales Data Mart Finance Data Mart Forecasting Data Mart International Sales Data Mart Vendor Information Data Mart  DATA WAREHOUSE: INTEGRATION OF MULTIPLE DATA MARTS INTO AN ENTERPRISE SOLUTION Marketing Data Mart Common Reference Values Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 8.
    THE BIG DATAEXPLOSION! Accounting Shipments Order Processing Manufacturing Transactional/ERP Analytical Big Data Currency Conversion Weather Trends SMS/MSS Photo’s Syndicated Data Web & Outside Data Sources EDW CRM Loyalty Segmentation Panel Data Wholesaler, Distributor & Broker Data Promotion Results Web Logs EDI Retailer POS Web Logs 3rd Party Data Click Stream Audio Textual Content Video Reputation Management Social Media Chatter Blogs Location Info 3-D Content Schmatics Geo-Spacial Speech to Text Demographics Emerging Market Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 9.
    WHAT’S THE DIFFERENCE? Un-Structured • Social Media • Chatter, Text Analytics, Blogs, Tweets, Comments, Likes, Followers, Social Authority, Clicks, Tags, etc. • Digital, Video • Audio • Geo-Spacial Multi-Structured /Hybrid • Emerging Market Data • Loyalty • E-Commerce • Other Third Party Data • Weather • Currency Conversion • Demographic • Panel • POS, POL, IR, EDI, RFID, NFC, QR, IRI, Rsi, Nielsen, Other Syndicated, IMS, MSA, etc. Structured ERP & DW • Main Frame • SQL Server • Oracle • DB2 • Sybase • Access, Excel, txt, etc • Teradata • Neteeza, Other mpp • SAP, JDE, JDA, Other ERP. Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 10.
    VOLUME! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 11.
    IT’S NOT JUSTSIZE , VARIETY! EDI RFID SAP DB2 Oracle TXT SQL AS2 CRM TPO JDE QR ACESS Mobile EXCEL NPD IMS TPM E-Comerce CRM Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 12.
    IT’S NOT JUSTVOLUME & VARIETY! VELOCITY MATTERS! • Daily • Weekly • Monthly • Quarterly • Annually • Every Hour • Every Minute • Every Second • Every Nano-Second! • Constantly Changing • Constantly Growning! Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 13.
    IT’S NOT JUSTVOLUME & VARIETY & VELOCITY. COMPLEXITY! • Aligning Hierarchy’s • Integrating Internal Master Data with Retailer Master Data • Applying Various Calendars • Regional Territories • Geographic alignment • Currency Conversion • Emerging Market • Loyalty • Market Basket • Cleansing Issues • Re-cast Data • Slowly Changing Dimensions (how you want to handle history, new stores, etc). Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 14.
    WHAT IS HADOOP? •HADOOP IS AN OPEN SOURCE DATA LIBRARY WITH 2 KEY COMPONENTS: 1. DISTRIBUTED FILE SYSTEM (HDFS) – FOR HIGH BANDWIDTH, CLUSTER BASED STORAGE 2. DATA PROCESSING FRAMEWORK – USES “MAPREDUCE” TO DISTRIBUTE/MAP LARGE DATA SETS ACROSS MULTIPLE SERVERS. EACH SERVER CREATES A SUMMARY OF THE DATA THAT HAS BEEN ALLOCATED TO IT. FROM THERE, DATA IS “REDUCED” OR “AGGREGATED.” SIMPLY PUT, IT IS MAPPED, THEN REDUCED. “HADOOP LETS YOU DEAL WITH VOLUME, VELOCITY AND VARIETY OF DATA. IT TRANSFORMS COMMODITY HARDWARE AND PROVIDES AUTOMATIC FAILOVER.” OWEN O’MALLEY, ARCHITECT FOR MAPREDUCE & SECURITY. Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 15.
    WHAT IS MAPREDUCE? • A PARALLEL PROGRAMMING FRAMEWORK • MADE POPULAR BY GOOGLE • GENERATE SEARCH INDEXES • WEB SCORING ALGORITHMS • C++, JAVA, PYTHON, ETC. • HARNESS 1000S OF CPUS • MAPREDUCE PROVIDES • AUTOMATIC PARALLELIZATION • FAULT TOLERANCE • MONITORING & STATUS UPDATES “MAPREDUCE ALLOWS PROGRAMMERS WITHOUT ANY EXPERIENCE WITH PARALLEL AND DISTRIBUTED SYSTEMS TO EASILY UTILIZE THE RESOURCES OF A LARGE DISTRIBUTED SYSTEM.” - JEFFREY DEAN AND SANJAY GHEMAWAT, GOOGLE, INC., 2004 Map Function Scheduler Results map shuffle reduce Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 16.
    MAPREDUCE IS SIMPLEWORD COUNT Unstructured Data Input Boat Yacht Lake House House Lake Boat House Yacht Fish Fish Fish Splitting Mapping Shuffling Reducing Result Boat Yacht Lake House House Lake Boat House Yacht Fish Fish Fish Boat, 1 Yacht, 1 Lake, 1 House, 1 House, 1 Lake, 1 Boat, 1 House, 1 Yacht, 1 Fish, 1 Fish, 1 Fish, 1 Boat, 1 Boat, 1 Yacht, 1 Yacht, 1 Lake, 1 Lake, 1 House, 1 House, 1 House, 1 Fish, 1 Fish, 1 Fish, 1 Boat, 2 Yacht, 2 Lake, 2 House, 3 Fish, 3 Boat, 2 Yacht, 2 Lake, 2 House, 3 Fish, 3 Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 17.
    COMMON TERMINOLOGY •PIG – HIGH LEVEL LANGUAGE THAT CONVERTS WORK TO MAPREDUCE • HIVE – TRANSFORMS & CONVERTS TO MAPREDUCE USING SQL • HBASE – SCALABLE, DISTRIBUTED DATABASE. PROVIDES A SIMPLE INTERFACE TO DATA (I.E. FACEBOOK MESSAGES UTILIZE THIS) • ZOOKEEPER – PROVIDES COORDINATION FOR SERVERS • HCATALOG – METADATA PULLED OUT OF HIVE • MAHOUT – MACHINE LEARNING LIBRARY • SCOOP – TOOL TO RUN MAPREDUCE APPS THAT PULL OR PUSH OUT OF SQL OR ORACLE • CASCADE – TRANSLATES DOWN INTO MAPREDUCE • OOZIE – WORKFLOW COORDINATION TO LEARN MAPREDUCE JOBS • FUSE DFS – USED TO ACCESS LINUX FILES Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 18.
    HOW CAN BIGDATA BE USED? • BIG DATA CAN BE USED TO MICRO-SEGMENT CUSTOMERS, ANALYZE SENTIMENT, PREDICT BEHAVIOR, PERSONALIZE OFFERS, CROSS-SELL AND UPSELL ACROSS CHANNELS, MANAGE REPUTATION, INCREASE SALE AND PROFITS. • COMPANIES NEED TO “WALK BEFORE YOU RUN.” • THE “BUILD IT & THEY WILL COME” PHILOSOPHY RARELY WORKS. IDENTIFY A BUSINESS NEED. Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 19.
    SOCIAL MEDIA REQUIRESYOU TO LISTEN ENGAGE INFORM OFFER Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 20.
    LEVERAGING THE DATAMEANS YOU NEED TO ACCESS ANALYZE ACT Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 21.
    IS SOCIAL MEDIAREALLY WORTH LEVERAGING? ACCORDING TO THE PEW RESEARCH CENTER: • 100 MILLION ACTIVE USERS • 50 MILLION LOG ON TO TWITTER EVERYDAY • 55% ARE MOBILE USERS ------------------------------------------- • AVERAGE TWEETS SENT PER DAY (IN MILLIONS): • IN JANUARY, 2010 – 50 MILLION TWEETS PER SECOND • IN FEBRUARY, 2011 – 140 MILLION TWEETS PER SECOND • IN SEPTEMBER, 2011 – 230 MILLION TWEETS PER SECOND • There were 2.5 million tweets regarding Steve Jobs’ death in the first 13 hours after it was reported, which is about 53 tweets per second. • 6,939 Tweets per second in Japan on New Years Eve at Midnight According to McKinsey Global Institute: • Facebook – 700,000,000,000 minutes spent/month • Google – 34,000 search/sec • Email – 838,000,000 messages in 2013 • Twitter – 500,000,000 tweets/day Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 22.
    IT’S ONLY JUSTBEGUN! • LINKEDIN • FACEBOOK • YOUTUBE • SLIDESHARE • BRIGHTTALK.COM • SCRIBED • NAYMZ • JIGSAW • SPOKE • G+ • TWITTER • VINE • INSTAGRAM • BING Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 23.
    UNDERSTAND YOUR INTERACTIONS Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 24.
    KNOW YOUR SOCIALREPUTATION Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 25.
    KNOW WHERE YOURSENTIMENT IS COMING FROM Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 26.
    SEE WHERE YOURCHAMPIONS ARE Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 27.
    UNDERSTAND WHERE YOUNEED DAMAGE CONTROL Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 28.
    WHAT ARE YOURFOLLOWERS SAYING Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,
  • 29.
    GOALS FOR TODAY– ACCOMPLISHED! • TO DEFINE BIG DATA – VOLUME, VARIETY, VELOCITY & COMPLEXITY • EXPLAIN HOW BIG DATA CAN IMPROVE BUSINESS – LISTEN, ENGAGE, INFORM & OFFER • EXPLAIN HOW TO USE IT – LEVERAGING A FOUNDATION • SHOW THE IMPORTANCE OF LEVERAGING SOCIAL MEDIA – INTEGRATE WITH OTHER DATA
  • 30.
    THANK YOU &STAY TUNED! • FOLLOW JANET DORENKOTT ON LINKEDIN, EMAIL JANETD@RELATIONALSOLUTIONS.COM • CALL US AT 440-899-3296, JANET IS X225 / KAREN IS X 232 • FOLLOW RELATIONAL SOLUTIONS ON LINKEDIN, TWITTER @POSMARTBLUESKY & ON FACEBOOK • JOIN OUR “DEMAND SIGNAL REPOSITORY INSTITUTE” & “BIG DATA ASSOCIATION” GROUP ON LINKEDIN • SUBSCRIBE TO THE RELATIONAL SOLUTIONS CHANNEL ON YOUTUBE: • RELATIONAL SOLUTIONS CHANNEL • VISIT US AT WWW.RELATIONALSOLUTIONS.COM OR CALL 440-899-3296 X225 • LEARN MORE FROM OUR WEBINARS & DOWNLOAD OUR WHITEPAPERS • SEE PRODUCT DEMO’S & DOWNLOAD TRIALS FROM OUR WEBSITE Property of Relational Solutions, Inc. By Janet Dorenkott June, 2013,