Big Data & the Cloud
 

Big Data & the Cloud

on

  • 3,396 views

 

Statistics

Views

Total Views
3,396
Views on SlideShare
3,266
Embed Views
130

Actions

Likes
8
Downloads
167
Comments
1

3 Embeds 130

http://www.dataversity.net 121
https://twitter.com 6
http://localhost 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I like your Big Data presentation.
    I would like to share with you document about application of Big Data and Data Science in retail banking. http://www.slideshare.net/LadislavUrban/syoncloud-big-data-for-retail-banking-syoncloud
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data & the Cloud Big Data & the Cloud Presentation Transcript

    • “Big Data” and “The Cloud” Robert J. Abate, CBIP, CDMP Independent Consultant Webinar: March 20th, 2012 2PM EST / 11AM PST
    • “Big Data” And “The Cloud” - Agenda The Industry Is A Buzz… The Challenges Of Big Data Architectural Solutions & The Cloud It’s A Brave New World Case Studies Questions & Answers2 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • The Industry Is A Buzz… “Despite the hype, most firms find the technology useful to operate on data they already have” Source: Forrester, June 2011
    • Everyone Is Talking About Big Data… “Big data will represent a hugely disruptive force during the next five years – enabling levels of insight – that are currently unachievable through any other means” Gartner: May 2011 “Big Data: Huge Management Implications with Enormous Returns” IDC: March 2011 “Big data is still in mostly unchartered territory, but a surprise number is actually doing something with it” Forrester: June 2011 “61% of respondents feel big data will fundamentally change the way their business works CIO/Insight: November 2010 “Most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies, potential application areas, and why ‘big data BI’ contrasts with traditional BI tools. It differs dramatically from traditional BI in terms of both capabilities and in the technologies used to achieve those capability breakthroughs” Gartner: January 20124 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • What Are The Drivers For Big Data/Cloud We Are In The Information Age Every corporation today is in the “Data Business” We Are Inundated In Data Types Sources Varieties Data Is Growing Exponentially So are the challenges Data Complexity Is Increasing Causing insight to be lost5 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Pictorial Representation Of Information6 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Big Data Is More Than Just Volume Consider: Master Data, Fidelity, Complexity, Validity, Perishability, Linking Data Transactional Data Structured Data: POS Industry- transactions, call detail specific Web traffic Video records, credit card Velocity Volume transactions, shipping updates, purchase orders, payments, shipments, account transactions Unstructured Data: Web Social logs, newsfeeds, social Text media, geo-location, mobile, consumer comments, claims, doctor’s notes, clinical Variety Complexity studies, images, video, Sensor/ audio location- Device-generated Data: based Audio Device- RFID sensors, smart meters, smart grids, GPS Documents Images spatial, micro-payments Smart Grid7 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Big Data’s Potential Is Limitless TODAY TOMORROW Less than 10% of enterprises Vast majority of available information sources and external data “Rear-view” mirror reporting, Forward looking or dashboards and analysis “Windshield-view” predictions Days, weeks, months, or with recommendations even quarters old Real-time near real-time Incomplete, inaccurate, and Correlated, high confidence, disjointed data governed data Architectures and methods Vastly accelerated time to that take 6 to 18 months to market exploit8 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Time Really Is Money! Value “THE TIME VALUE CURVE” © 2007 - Dr. Richard Hackathorn, Bolder Technology, Inc., All Rights Reserved. Used with Permission. Business Event Capture Value Lost Latency Data Ready For Analysis Analysis Latency Information Delivered Action Decision Taken Latency Action Time Time Data Lifecycle9 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Data Is Coming At Us Faster In A Recent TDWI Survey Of 450 CIO’s 17% have a real time data warehouse 90% plan on having a real time warehouse 75% will replace to get to a real-time solution Big Data Projects Are Enterprise-Scale When asked: Enterprise 65% “What Is The Scope Of Line of business 8% Departmental 8% Your Big Data Initiative?” Project-based 8% Regional 5% Other 5% Source: Forrester® June 2011 Global Big Data Online Survey10 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Data Is Coming From All Directions… Data is now commonly entering into the enterprise from external sources Government (Census, Revenues, …) Neilson, NPD Group (Sales) Bloomberg, NYSE (Financial Position) Experian, TransUnion, Equifax (Credit Reporting) Google Maps, MapInfo (Geospatial, …) Radian 6, Biz360, … (Client Trend Data) Etc.11 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Need For “Trust In Data” Compliance with laws Sarbanes Oxley [SOX], BASIL II, HIPAA, etc. Lack of confidence in the data Reports utilizing same data do not report same totals or computations Data not defined and readily available Multiple sources of data have to be rationalized at each project start- up thereby wasting valuable time & $ on every project Data timeliness Manual process to collect, analyze and provide results Data integrity Unknown filters, varying calculation/computations, fields used for data not indicative of field names, data passed along from one person to another to another to another…..12 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Summation Of Industry “Buzz” Business mandate to obtain more value out of the data (get answers) Variety of sources, amounts, types and granularity of data that customers want to integrate is growing exponentially Need to shrink the latency between the business event and the data availability for analysis and decision-making Advancing agility of information is key Need for Data trust and Compliance with regulations13 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • The Challenges Of Big Data “If It Was That Easy, Everyone Would Be Doing It” Source: Unknown
    • The Information Issue Is? Too many organizations are not using information to its full advantage! 1 in 3 business leaders frequently make critical decisions without the information they need 1 in 2 business leaders do not have access to the information across their organization needed to do their jobs. 3 in 4 business leaders say more predictive information would drive better decisions Source: IBM Institute for Business Value, March 200915 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Business Alignment & Trust A Recent CIO:INSIGHT Poll of CIO’s Found 56% of respondents say they feel overwhelmed by the amount of data their enterprise manages 33% of respondents want even more sources of data, despite their feelings of being overwhelmed by it 62% of respondents say they’re frequently interrupted by irrelevant incoming data 43% of respondents say they’re dissatisfied with the current tools they use to filter out irrelevant data 46% of respondents say they’ve made inaccurate business decisions as a result of bad or outdated data One in Three report that they “can’t find the right people with the right data” Source: “The Big Data Conundrum”, http://www.cioinsight.com/c/a/Storage/The-Big-Data-Conundrum-568229/16 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Viewed Another Way… If a football team had these players on the field: Only 4 of the 11 players on the field would know which goal is theirs Only 6 of the 11 would care Only 3 of the 11 would know what position they play and what they are supposed to do 9 players out of 11 would, in some way, be competing against their own team rather than the opponent17 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • BI Perception Is Complicated & Slow BI/DW is perceived as not “enabling” the business Inhibitor to corporate progress IT systems cannot be changed fast enough to meet market demands, seize opportunity or comply with a new requirement. Weak alignment between IT and business strategy Marked by an intractable language barrier. Business not always sure what information or dimensions they want or need To answer questions about what to do next BI/DW has not been known as a source of innovations The complexity of systems has caused BI/DW to be reactive rather than proactive Silo’d solutions, db’s and applications with trapped business rules Multiple sources of information and no single “truth” No “Architectural Blueprints” to the enterprise…18 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • BI & D/W – The “Old Way” Data Chaos Master Data Business Intelligence • Same type data is different • Publish and subscribe to • Analyzing the data by in diverse systems master data looking into history • EG: AT&T is the same as • EG: Single view of • Viewing graphs of AT&T Inc customer across all historical information information systems PROCESSES Data Discovery DQ / Data Governance Data Integration BI & Data Mining Data Defined Master Integrated Business D/W KPI’s Chaos Data Data Information Intelligence Dashboards TOOLS Profiling Metadata / MDM Data Modeling & ETL BI / DW / OLAP Defined Data Integrated Information D/W KPI’s & Dashboards • Defined common • Bring metadata together • Drilling into information to find meanings with modeled information and analyze trends • EG: Determine the for reporting (BI) and • KPI’s and metrics that offer a sources, types, and warehousing (drilling and glimpse into historical properties of grouped (i.e.: hierarchies). performance customer) records • Exception reporting and alerts19 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • The “Intelligence” Maturity Model20 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Advancing The Maturity Of BI21 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • The Big Data Method Data Chaos Data Matching Data Analytics • Same type data is different • Profiling of information to • Using Data Scientists, in diverse systems determine quality evaluate data utilizing • EG: AT&T is the same as • Automated analysis to mathematical algorithms AT&T Inc match information and visualization toolsets PROCESSES Data Discovery DQ / Data Governance Analytics Utilizing Data Scientists Data Data Data Business Integrated Data Performance Chaos Analysis Matching Information Analytics Optimization TOOLS Profiling & Matching / DQ Query Federation “R”, Defined Data Integrated Information Performance Optimization • Defined common • Bring metadata together • Using analytics, changes to meanings from matching into data business models are made • EG: Determine the stores and sharing with • Analysis of models improve sources, types, and analysis toolsets business and optimize business properties of grouped (i.e.: • Organizing information for performance customer) records rapid retrieval22 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Architectural Solutions & The Cloud “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” Richard Buckminster Fuller
    • Big Data Required A Big Change Consider 100 GB would store the entire US Census DB “basic” information set for every living human being on the planet: Age, Sex, Income, Ethnicity, Language, Religion, Housing Status, Location into a 128 bit set That equates to about 6.75 millions rows of about 10 columns Consider the Large Hadron Collinder within the CERN Laboratories Expected to produce 150,000 times as much raw data each year What makes large data sets are repeated observations over time / space (spatial or temporal dimensions) Web log has Millions [M] of visits over a handful pages Retailer has 100K products, M customers, but Billions of transactions Hi-Res Scientific like fMRI 1K-GB per view Cardinalities (distinct observations) was usually small with regard to total # of observations This was starting to change with the advent of device supplied information, sensors and other semi and unstructured data sources24 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • A Change In Technology Was Needed Consider that Relational technologies were invented to get data in and organized, not designed nor organized to get it out RDBMS’s were designed for efficient transactions processing on large data sets Adding, Updating Searching for & retrieving small amounts of data Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/0925 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Data Warehousing Was A “Fix” DW was classically designed as “copy of transaction data specifically structured for query and analysis” General approach was bulk ETL into a DB designed for queries Big data caused this “Fix” to break “Traditional RDBMS-based dimensional modeling and cube- based OLAP turns out to be to slow or to limited to support asking the really interesting questions of warehoused data” “To achieve acceptable performance for highly order-dependent queries on truly large data, one must be willing to consider abandoning the purely relational database model” Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/0926 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Then Change Came In Technologies… The advent of cloud and storage costs Infrastructure utilization increased dramatically Low TCO and cost of storage and memory dropped significantly spawning powerful computing paradigms and appliances The advent of commodity-based processing in a grid or MPP config Usage of existing hardware in a grid paradigm supporting queries across entire datasets “Hadoop” & MPP Shared Nothing Architectures27 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Technology Solutions Appeared Massively Parallel Processing Teradata, Greenplum, etc. Grid Hadoop, MapReduce, Cassandra, etc. Columnar ParAccel, Vertica, Sybase, Sand Technologies, etc. Hardware Appliances A visualization of a network of Facebook connections, from DATAllegro, Netezza, previous related research by Mucha and others. Oracle Exadata, etc. Credit: Amanda L. Traud, Christina Frost, UNC-Chapel Hill. Source: http://www.physorg.com/news192985912.html28 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Virtualization & The Cloud29 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Data Virtualization In The Cloud30 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Advances Provided Answers To Silos “What Areas Do Your Big Data Initiatives Address?” Source: Forrester® June 2011 Global Big Data Online Survey31 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • It’s A Brave New World… “Who Owns Or Drives Your Big Data Initiatives?” Source: Forrester, June 2011 Business/IT collaboration 70% Mostly business-driven, with minimal IT 15% involvement Mostly IT-driven, with minimal business 12% involvement Don’t know 2% Other 2%
    • From The Old Stack To A New Ecosystem Data integration without pre-processing Ability to locate and to query federated sources of data and content without costly data modeling and ETL transformation Variety of sources (Mergers & Acquisitions, Growth, Services) Inability to rapidly add new data sources because of tightly coupled business rules Need for flexible data structures Current structures are rigid and are views of the sources or the business requirements Incorporation of unstructured data including social media Need tools to integrate and analyze unstructured sources that are not currently used Need to incorporate and utilize metadata Metadata is disjointed, confined and incompatible – need uniformed, agile approach Dynamic information with views for a reason Need creation and structuring of views that support dynamic information for purpose Information management and governance in a regulated world Security and entitlement checking integrated with query processing Information grants handled thru XACML obligations33 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • The New “Data Fabric” Transformation Coordinates ingestion of information no matter what the source Micro-batch takes the place of batch Tagging replaces transformation Federated query replaces ETL Query direction removes the need for optimization of data stores Purposeful view is the new master data repository34 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Newest Trends In Big Data & The Cloud Compelling Analytics Provide Extreme ROI Data Visualization Technologies Heat, Clouds, Clusters, Flows Mixing Structured, Semi and Unstructured Sources Self-service analytics - Build your own sandbox! Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information" Big Data Cloud Encircled Warehouses Data Virtualization Abstracting the data from the systems Complements existing data warehouses Many times the size of structured warehouse Provides for rapid analytic iterations Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization35 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Data Visualization In Practice WorldWideWeb Around Wikipedia - Wikipedia as part of the world wide web Created by Chris 73 | Talk 09:56, 18 Jul 2004 (UTC) using TouchGraph GoogleBrowser V1.01 Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization36 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • A Picture Is Worth A Thousand Words Source: Greenplum, An EMC Corporation37 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Mixing Structured, Semi & Unstructured Sources…38 Source: Information Builders Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Big Data Cloud Encircled Warehouses Source: EMC Corporation39 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Case Studies In the real world, we find out the reasons why Murphy’s Law is so prevalent…
    • Telecomm Provider Finds Answers… Before investing tens of millions in infrastructure, a telecomm firm learned where to invest their monies… Challenge 100TB Traditional EDW, Single Source Of Truth Operational Reporting & Financial Consolidation Heavy Governance And Control Unable To Support Critical Business Initiatives Customer Loyalty And Churn The #1 Business Initiative From The CEO Enterprise Big Data Cloud Surrounded Warehouse Extracted Data From EDW & Other Sources Generated Social Graph From Call Detail And Subscriber Data Within 2 Weeks Found “Connected” Subscribers 7X More Likely To Churn Than Average Users Now Deploying 1PB Production Source: Greenplum, an EMC Corporation41 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Questions & Answers Open Exchange Of Ideas Speaker Contact Information: Robert J. Abate r.j.abate@att.net (201) 745-768042 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate
    • Curriculum Vitae Of Presenter Robert J. Abate, CBIP, CDMP As a hands-on, accomplished Information Technology professional, Mr. Abate offers 30 years of experience in Architectures, Applications, Business Intelligence & Analytics, Infrastructure, and IT strategy. He is credited as one of the first to publish on Services Oriented Architectures (1996), and a respected IT thought leader within the field. He holds a Bachelors of Science in Electrical Engineering, and is a Certified Business Intelligence Professional and a Certified Data Management Professional in four disciplines. Mr. Abate both chairs and presents at global conferences and a member of the board of DAMA and is a respected author and industry thought- leader. Mr. Abate frequently can be heard giving talks on topics such as “The Convergence Of SOA & BI,” “Best Practices In Enterprise Information Management,” “Making Big Data Analytics Actionable”, and “Data Services & Virtualization”.43 Big Data & The Cloud – March 20th, 2012 © 2012 – Dataversity & Robert J. Abate