Big data: tendências e oportunidades - Palestrante: Ana Oliveira

•Download as PPTX, PDF•

0 likes•1,254 views

Rio Info 2013 Tecnologias Inovadoras 17 de setembro - 14h às 18h Big data: tendências e oportunidades Palestrante: Ana Oliveira

Technology Business

Copyright © 2013 EMC Corporation. All Rights Reserved.
Cientistas
de Dados –
quem e
como?
Ana Oliveira
EMC Centro de P&D
ana.oliveira@emc.com

Copyright © 2013 EMC Corporation. All Rights Reserved.
EMC’s Big Data R&D Center
Centro de Pesquisa
Aplicada no Parque
tecnológico, na Ilha do
Fundão
 Proximidade com a universidade
 Ecossistema:
Schlumberger, Halliburton, BG, Sie
mens, GE, Petrobras, and others

Copyright © 2013 EMC Corporation. All Rights Reserved.
Realidade

Copyright © 2013 EMC Corporation. All Rights Reserved.
Você está pronto para criar,
consumir e gerenciar
40 ZETTABYES
de Dados?

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
De onde vem este Dilúvio?
Geophysical
Exploration
Medical Imaging
Video
SurveillanceMobile Sensors
Video Rendering
Gene Sequencing
Smart Grids
Social Media
READING METERS
EVERY 15 MINS.
IS 3,000X MORE
DATA INTENSIVE COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$100M IN 2001
TO $10K IN 2011
FACEBOOK
GROWSBY
250 MILLION
PHOTOS / DAY

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Cenário
•Dados estruturados, dados desestruturados, dados
de sensores
•Big Data não é somente sobre os “dados”,
precisamos considerar novas tecnologias como:
• Scale out storage
• Arquiteturas de Bancos de dados Massivamente
paralelos
• Hadoop e o o seu ecossistema
• In-database analytics
• In-memory computing
• Virtualizaçao
• Visualização

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
“The Sexiest Job of the 21st
Century”, Harvard Business Review
(October 2012)

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Role Role Description
Deep Analytical
Talent
People with advanced training in
quantitative disciplines, such as
mathematics, statistics, and machine
learning.
Data Savvy
Professionals
People with a basic knowledge of
statistics and/or machine learning, who
can define key questions that can be
answered using advanced analytics
Technology & Data
Enablers
People providing technical expertise to
support analytical projects. Skills sets
including computer programming and
database administration
Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation,
competition, and productivity
Data
Scientists
Projected U.S.
talent gap:
140,000 to
190,000
Analysts &
Data Savvy
Managers
Projected U.S.
talent gap: 1.5
million
8

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
High
FuturePast TIME
BUSINESS
VALUE
Data
Science
Business
Intelligence
Low
Data Science
• Predictive analysis
• What if…?
Business
Intelligence
• Standard reporting
• What happened?
Qual a diferença?

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Análise - Ciclo de Vida

Copyright © 2013 EMC Corporation. All Rights Reserved.
Processo Analítico - Tradicional
Administrator
Bottleneck
Opaque,
No Collaboration
Reactive,
Unresponsive

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
Processo Analítico – Big Data
Self-Service Transparent,
Real Time
Collaboration
Iterative, Agile

Copyright © 2013 EMC Corporation. All Rights Reserved.
Perfil de um Cientista de Dados
Técnico
Quantitativo
Curiosidade
&
Creatividade
Comunicativo
&
Colaborativo
Cético

Copyright © 2013 EMC Corporation. All Rights Reserved.
• Captura
 Programação, Banco de
Dados
 Conhecimento do domínio.
 Modelagem de Dados, DW,
dados não estruturados.
• Análise
 Estatística, ML, Matemática,
Pesquisa Operacional….
• Apresentação
 Visualização
 Storytelling
Skills
Oscar Olmedo, Nasa, data scientist

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
“Analyzing the analyzers”, Harris, Murphy, Vaisman
Como você se vê?

Copyright © 2013 EMC Corporation. All Rights Reserved.

Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved.
OBRIGADA!

What's hot

ePlus Presents Big Data 101ePlus

Big Data Expo 2015 - IBM Outside the comfort zoneBigDataExpo

Tim O'Brien Gartner Symposium EMEA 2012Tim O'Brien

2019 tech forecast jan 2 2019Paul Barter

If companies are not careful, "Big Data" will become "Big Dilbert"JAX Chamber IT Council

Artificial intelligence - Digital Readiness.Dr. Kim (Kyllesbech Larsen)

Counter Intuitive Machine Learning for the Industrial Internet of ThingsJune Andrews

Industrial Machine Learning (SIGKDD17)Joshua Bloom

Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBigDataExpo

The evolution of the homo dataAvner Algom

Improving healthcare with AIJulien SIMON

Iot and ip groupPaul Barter

Big Data Expo 2015 - IBM 5 predictionsBigDataExpo

Future of Technology Dr Nikolaus Eberl | Motivational Intelligence

Artificial intelligence - A Teaser to the Topic.Dr. Kim (Kyllesbech Larsen)

Smart City Alliance final 09 18 2014 cisco+ibmRick Huijbregts

How Big Data Will Change Businesses In 2018Tyrone Systems

10/28 Top 5 Deep Learning StoriesNVIDIA

Proyecto 2 mao_aviles

IoT and AI Services in Healthcare | AWS Public Sector Summit 2017Amazon Web Services

What's hot (20)

ePlus Presents Big Data 101

Big Data Expo 2015 - IBM Outside the comfort zone

Tim O'Brien Gartner Symposium EMEA 2012

2019 tech forecast jan 2 2019

If companies are not careful, "Big Data" will become "Big Dilbert"

Artificial intelligence - Digital Readiness.

Counter Intuitive Machine Learning for the Industrial Internet of Things

Industrial Machine Learning (SIGKDD17)

Big Data Expo 2015 - Microsoft Transform you data into intelligent action

The evolution of the homo data

Improving healthcare with AI

Iot and ip group

Big Data Expo 2015 - IBM 5 predictions

Future of Technology

Artificial intelligence - A Teaser to the Topic.

Smart City Alliance final 09 18 2014 cisco+ibm

How Big Data Will Change Businesses In 2018

10/28 Top 5 Deep Learning Stories

Proyecto 2

IoT and AI Services in Healthcare | AWS Public Sector Summit 2017

Viewers also liked

Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...Rio Info

RioInfo 2010 - Fórum de Negócios - Salão da Inovação - Felipe DornelesRio Info

Rio Info 2015 - Projetos de Big Data no Setor Público - Karin BreitmanRio Info

Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...Rio Info

28/09/2011 - 09h30 às 13h - TI & Petróleo Inyang EffiongRio Info

Seminário TI na Educação - Painel 2 - O papel do vídeo digital no futuro da e...Rio Info

Inovação e equipes geograficamente distribuídas - Palestrante: Hugo FuksRio Info

Economia Criativa Digital - Incentivos governamentais para a economia criativ...Rio Info

Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando RibasRio Info

Viewers also liked (9)

Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...

RioInfo 2010 - Fórum de Negócios - Salão da Inovação - Felipe Dorneles

Rio Info 2015 - Projetos de Big Data no Setor Público - Karin Breitman

Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...

28/09/2011 - 09h30 às 13h - TI & Petróleo Inyang Effiong

Seminário TI na Educação - Painel 2 - O papel do vídeo digital no futuro da e...

Inovação e equipes geograficamente distribuídas - Palestrante: Hugo Fuks

Economia Criativa Digital - Incentivos governamentais para a economia criativ...

Rio Info 2015 - Salão da Inovação - Portugal Finity - Orlando Ribas

Similar to Big data: tendências e oportunidades - Palestrante: Ana Oliveira

UFF Tech 2013 - Big Data - Rafael Borges EMCSti Uff

Smarter planet and mega trends presentation 2012Joergen Floes

Identifying the Value of Informational Assets Before You Move Them to the CloudEMC

Compounding Business Value Through Big Data & Advanced Analytics, v2denesuk

Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Amazon Web Services Korea

Cw13 cloud meets big data by ibrahim alloub-emcinevitablecloud

vip_day_2._1130_cloudNicholas Chia

Quelle stratégie pour EMC en 2015 ? Repensons l'ITRSD

What You Need to Know About SaaS Application Data ProtectionSpanning Cloud Apps

Beware of the Risk Behind Big DataEMC

DAMA Big Data & The Cloud 2012-01-19Robert J. Abate, CBIP, CDMP

Cloud Computing and CDO (April 29).pdfPablo Junco

Cloud Infrastructure and Services (CIS) - WebinarEMC

Smart Grid Analytics: All That Remains to be Ready is YouLauren Watters

How to develop a data scientist – What business has requested v02Data Science London

reStartEvents TS/SCI & Above Employer Directory 3/31Ken Fuller

GigaOM Putting Big Data to Work by Brett SheppardBrett Sheppard

CTAM Making Analytics Actionable RJA FINALRobert J. Abate, CBIP, CDMP

The computing ageDidier Mamma

Vmware cio event barcelona 2014 - no buildsRussell Acton

Similar to Big data: tendências e oportunidades - Palestrante: Ana Oliveira (20)

UFF Tech 2013 - Big Data - Rafael Borges EMC

Smarter planet and mega trends presentation 2012

Identifying the Value of Informational Assets Before You Move Them to the Cloud

Compounding Business Value Through Big Data & Advanced Analytics, v2

Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...

Cw13 cloud meets big data by ibrahim alloub-emc

vip_day_2._1130_cloud

Quelle stratégie pour EMC en 2015 ? Repensons l'IT

What You Need to Know About SaaS Application Data Protection

Beware of the Risk Behind Big Data

DAMA Big Data & The Cloud 2012-01-19

Cloud Computing and CDO (April 29).pdf

Cloud Infrastructure and Services (CIS) - Webinar

Smart Grid Analytics: All That Remains to be Ready is You

How to develop a data scientist – What business has requested v02

reStartEvents TS/SCI & Above Employer Directory 3/31

GigaOM Putting Big Data to Work by Brett Sheppard

CTAM Making Analytics Actionable RJA FINAL

The computing age

Vmware cio event barcelona 2014 - no builds

Recently uploaded

What is Artificial Intelligence?????????blackmambaettijean

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Rise of the Machines: Known As Drones...Rick Flair

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Sample pptx for embedding into website for demoHarshalMandlekar2

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Recently uploaded (20)

What is Artificial Intelligence?????????

SIP trunking in Janus @ Kamailio World 2024

Dev Dives: Streamline document processing with UiPath Studio Web

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

WordPress Websites for Engineers: Elevate Your Brand

Rise of the Machines: Known As Drones...

DevEX - reference for building teams, processes, and platforms

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

What's New in Teams Calling, Meetings and Devices March 2024

Sample pptx for embedding into website for demo

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

TeamStation AI System Report LATAM IT Salaries 2024

Take control of your SAP testing with UiPath Test Suite

Generative AI for Technical Writer or Information Developers

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Gen AI in Business - Global Trends Report 2024.pdf

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

DSPy a system for AI to Write Prompts and Do Fine Tuning

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Big data: tendências e oportunidades - Palestrante: Ana Oliveira

2. Copyright © 2013 EMC Corporation. All Rights Reserved. EMC’s Big Data R&D Center Centro de Pesquisa Aplicada no Parque tecnológico, na Ilha do Fundão  Proximidade com a universidade  Ecossistema: Schlumberger, Halliburton, BG, Sie mens, GE, Petrobras, and others

5. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. De onde vem este Dilúvio? Geophysical Exploration Medical Imaging Video SurveillanceMobile Sensors Video Rendering Gene Sequencing Smart Grids Social Media READING METERS EVERY 15 MINS. IS 3,000X MORE DATA INTENSIVE COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 FACEBOOK GROWSBY 250 MILLION PHOTOS / DAY

6. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Cenário •Dados estruturados, dados desestruturados, dados de sensores •Big Data não é somente sobre os “dados”, precisamos considerar novas tecnologias como: • Scale out storage • Arquiteturas de Bancos de dados Massivamente paralelos • Hadoop e o o seu ecossistema • In-database analytics • In-memory computing • Virtualizaçao • Visualização

8. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Role Role Description Deep Analytical Talent People with advanced training in quantitative disciplines, such as mathematics, statistics, and machine learning. Data Savvy Professionals People with a basic knowledge of statistics and/or machine learning, who can define key questions that can be answered using advanced analytics Technology & Data Enablers People providing technical expertise to support analytical projects. Skills sets including computer programming and database administration Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation, competition, and productivity Data Scientists Projected U.S. talent gap: 140,000 to 190,000 Analysts & Data Savvy Managers Projected U.S. talent gap: 1.5 million 8

9. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. High FuturePast TIME BUSINESS VALUE Data Science Business Intelligence Low Data Science • Predictive analysis • What if…? Business Intelligence • Standard reporting • What happened? Qual a diferença?

12. Copyright © 2013 EMC Corporation. All Rights Reserved.Copyright © 2011 EMC Corporation. All Rights Reserved. Processo Analítico – Big Data Self-Service Transparent, Real Time Collaboration Iterative, Agile

14. Copyright © 2013 EMC Corporation. All Rights Reserved. • Captura  Programação, Banco de Dados  Conhecimento do domínio.  Modelagem de Dados, DW, dados não estruturados. • Análise  Estatística, ML, Matemática, Pesquisa Operacional…. • Apresentação  Visualização  Storytelling Skills Oscar Olmedo, Nasa, data scientist

Editor's Notes

Hiring of 50+ NationalsHiring and sponsorship of just as many interns and grantsFirst EBC in Latin AmericaThis is an artists rendering of what the facility will look like. [click]. This is what it really looks like right now. I like the pretty picture better [click]
Fonte IDC -
The data deluge is enabled by the Economies of Cloud, but is driven by the connected era avalanche of devices…The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media).
The new data ecosystem driven by the arrival of big data will require 3 archetypical roles to provide services. Here are some professions that represent illustrative examples of each of the 3 main categories.Deep Analytical TalentTechnically savvy, with strong analytical skills Combination of skills to handle raw data, unstructured data and complex analytical techniques at massive scalesNeeds access to magnetic, analytic sandbox Examples of professions: Data Scientists, Statisticians, Economists, MathematiciansData Savvy ProfessionalsExamples of professions: Financial Analysts, Market Research Analysts, Life Scientists, Operations Managers, Business and Functional ManagersTechnology & Data EnablersExamples of professions: Computer programmers, database administrators, computer system analysts
As background, it is important to understand that Business Intelligence is different than data science and analytics. BI deals with reporting on history. What happened last quarter? How many did we sell, etc.Data science is about predicting the future and understanding why things happen. What is the optimal solution? What will happen next?For many companies data science is a new approach to understanding the business yet an important one to undertake today. Gartner states that enterprises who are embracing Big Data and Data Science will outperform their peers by over 20% in the next five years.
Here are 5 main competency and behavioral characteristics for Data Scientists.Quantitative skills, such as mathematics or statistics Technical aptitude, such as software engineering, machine learning, and programming skills. Skeptical…..this may be a counterintuitive trait, although it is important that data scientists can examine their work critically rather than in a one-sided way.Curious & Creative, data scientists must be passionate about data and finding creative ways to solve problems and portray informationCommunicative & Collaborative: it is not enough to have strong quantitative skills or engineering skills. To make a project resonate, you must be able to articulate the business value in a clear way, and work collaboratively with project sponsors and key stakeholders.
Junho 2013
The message here is to make your enterprise “extraordinary” through IT transformation. EMC helps organizations transform their business through cloud enablement (reduce the 85% maintenance to 60%) - reducing the time/cost of “keeping the lights on” while improving business agility (i.e., basic cloud message). We can expand (or not) on this message and actually link it to Oil & Gas by discussing our leadership in virtualization (first step) - most major Oil & Gas companies are already implementing VMWare, EMC offers leading infrastructure platforms, we are building Oil & Gas solutions with our network of Service Provider Partners, etceterea. B) leading a path to business innovation and competitive advantage by leveraging Big Data solutions (shift your spend from 15% innovation to 40% innovation). Later, we can describe our vision of Big Data for Big Oil through the Volume/Variety/Velocity slide along with our $100M investment at the BRDC. C) My recommendation is to put Security below as the “pillar of trust”. It is a pre-requisite to achieving A) & B) and EMC offers the most secure solutions. The key message is that EMC has an end-to-end offering that allows Oil & Gas companies to transform IT from a cost center to an innovation center.
This slide illustrates the maturity of an organization and the amount of time spent on data collection, analyzing, and decision making. Clearly the less mature an organization the more time spent on gathering the data and less on analyzing and making decisions. The goal of an organization is to move to the more mature model where your organization can spend more time using the data to make business decisions.
We think cloud is the next wave of massive disruption in IT that started with Mainframes.It’s disruptive because of the dramatic benefits it delivers to organization in both cost efficiency and agility (bottom line and top line).Its disruptive because it’s built on disruptive technologies and disruptive technology leads to lasting change.In the case of cloud, it’s arguably the most disruptive because we are seeing the IT cloud wave and the consumer cloud wave happen simultaneously.
In the meantime, while everyone was at sleep….a revolution was happenning…Multi-thread – this is OLTP data
To unlock applications from the physical devices
There are multiple characteristics of big data, but 3 stand out as defining Characteristics: Huge volume of data (for instance, tools that can manage billions of rows and billions of columns)Complexity of data types and structures, with an increasing volume of unstructured data (80-90% of the data in existence is unstructured)….part of the Digital Shadow or “Data Exhaust”Speed or velocity of new data creation In addition, the data, due to its size or level of structure,cannot be efficiently analyzed using only traditional databases or methods.There are many examples of emerging big data opportunities and solutions. Here are a few: Netflix suggesting your next movie rental, dynamic monitoring of embedded sensors in bridges to detect real-time stresses and longer-term erosion, and retailers analyzing digital video streams to optimize product and display layouts and promotional spaces on a store-by-store basis are a few real examples of how big data is involved in our lives today. These kinds of big data problems require new tools/technologies to store, manage and realize the business benefit. The new architectures it necessitates are supported by new tools, processes and procedures that enable organizations to create, manipulate and manage these very large data sets and the storage environments that house them.Big data can come in multiple forms. Everything from highly structured financial data, to text files, to multi-media files and genetic mappings. The high volume of the data is a consistent characteristic of big data. As a corollary to this, because of the complexity of the data itself, the preferred approach for processing big data is in parallel computing environments and Massively Parallel Processing (MPP), which enable simultaneous, parallel ingest and data loading and analysis. As we will see in the next slide, most of the big data is unstructured or semi-structured in nature, which requires different techniques and tools to process and analyze.Let us examine the most prominent characteristic: its structure.
Data from the new 2011 Digital Universe from IDC, sponsored by EMCData growing 44XBut IT staff only growing at 1.5X by the end of the decadeOnly way to stay ahead of the data deluge is to increase the volume of information that can be managed per person, using new technologies and productivity tools
The graphic shows different types of data structures, with 80-90% of the future data growth coming from non structured data types (semi, quasi and unstructured). Although the image shows four different, separate types of data, in reality, these can be mixed together at times. For instance, you may have a classic RDBMS storing call logs for a software support call center. In this case, you may have typical structured data such as date/time stamps, machine types, problem type, operating system, which were probably entered by the support desk person from a pull-down menu GUI. In addition, you will likely have unstructured or semi-structured data, such as free form call log information, taken from an email ticket of the problem or an actual phone call description of a technical problem and a solution. The most salient information is often hidden in there. Another possibility would be voice logs or audio transcripts of the actual call that might be associated with the structured data. Until recently, most analysts would NOT be able to analyze the most common and highly structured data in this call log history RDBMS, since the mining of the textual information is very labor intensive and could not be easily automated.
Here are examples of what each of the 4 main different types of data structures may look like. People tend to be most familiar with analyzing structured data, while semi-structured data (shown as XML here), quasi-structured (shown as a clickstream string), and unstructured data present different challenges and require different techniques to analyze.For each data type shown, answer these questions: What type of analytics are performed on these data?Who analyzes this kind of data?What types of data repositories are suited for each, or requirements you may have for storing and cataloguing this kind of data?Who consumes the data?Who manages and owns the data?
…..describe or refer to NO SQL and KVPEveryone and everything is leaving a digital footprint. The graphic above provides a perspective on sources of big data generated by new applications and the scale and growth rate of the data. These applications provide opportunities for new analytics and driving value for organizations.These data come from multiple sources, including:Medical Information, such as genomic sequencing and MRIsIncreased use of broadband on the Web – including the 2 billion photos each month that Facebook users currently upload as well as the innumerable videos uploaded to YouTube and other multimedia sitesVideo surveillanceIncreased global use of mobile devices – the torrent of texting is not likely to ceaseSmart devices – sensor-based collection of information from smart electric grids, smart buildings and many other public and industry infrastructureNon-traditional IT devices – including the use of RFID readers, GPS navigation systems, and seismic processingThe Big Data trend is generating an enormous amount of information that requires advanced analytics and new market players to take advantage of it.
People tend to both love and hate spreadsheets. With their introduction, business users were able to create simple logic on data structured in rows and columns and create their own analyses to business problems. Users do not need heavy training as a database administrator to create spreadsheets, meaning business users could set these up quickly and independent of IT groups. Two main spreadsheet benefits are that they are easy to share and that end users have control over the logic involved. However, their proliferation caused organizations to struggle with “many versions of the truth”, i.e. it was impossible to determine if you had the right version of a spreadsheet, with the most current data and logic in it. Moreover, if a user lost a laptop or it became corrupted, that was the end of the data and its logic. Many organizations still suffer from this challenge (Excel is still on millions of PCs worldwide), which gave rise to the need for centralizing the data. As data needs grew, companies such as Oracle, Teradata, and Microsoft (via SQL Server) offered more scalable data warehousing solutions. These technologies enabled the data to be managed centrally, providing benefits of security, failover, and a single repository where users could rely on getting an “official” source of data for financial reporting or other mission critical tasks. This structure also enabled the creation of OLAP cubes and business intelligence analytical tools, which provided users the ability to access dimensions within this RDBMS quickly, and find answers to streamline reporting needs. Some providers also packaged more advanced logic and the ability to perform more in-depth analytical techniques such as regression and neural networks.<Continued>
Enterprise data warehouses (EDW) are critical for reporting and Business Intelligence (BI) tasks, although from an analyst perspective they tend to restrict the flexibility that a data analyst has for performing robust analysis or data exploration. In this model, data is managed and controlled by IT groups and DBAs, and analysts must depend on IT for access and changes to the data schemas. This tighter control and oversight also means longer lead times for analysts to get data, which generally must come from multiple sources. Another implication is that EDW rules restrict analysts from building data sets, which can cause shadow systems to emerge within organizations containing critical data for constructing analytic data sets, managed locally by power users.Analytic sandboxes enable high performance computing using in-database processing. This approach creates relationships to multiple data sources within an organization and saves the analyst time of creating these data feeds on an individual basis. In-database processing for deep analytics enables faster turnaround time for developing and executing new analytic models, while reducing (though not eliminating) the cost associated with data stored in local, "shadow" file systems. In addition, rather than the typical structured data in the EDW, analytic sandboxes can house a greater variety of data, such as webscale data, raw data, and unstructured data.
The graphic shows a typical data warehouse and some of the challenges that it presents. For source data (1) to be loaded into the EDW, data needs to be well understood, structured and normalized with the appropriate data type definitions. While this kind of centralization enables organizations to enjoy the benefits of security, backup and failover of highly critical data, it also means that data must go through significant pre-processing and checkpoints before it can enter this sort of controlled environment, which does not lend itself to data exploration and iterative analytics.(2) As a result of this level of control on the EDW, shadow systems emerge in the form of departmental warehouses and local data marts that business users create to accommodate their need for flexible analysis. These local data marts do not have the same constraints for security and structure as the EDW does, and allow users across the enterprise to do some level of analysis. However, these one-off systems reside in isolation, often are not networked or connected to other data stores, and are generally not backed up.(3) Once in the data warehouse, data is fed to enterprise applications for business intelligence and reporting purposes. These are high priority operational processes getting critical data feeds from the EDW.<Continued>
(4) At the end of this work flow, analysts get data provisioned for their downstream analytics. Since users cannot run custom or intensive analytics on production databases, analysts create data extracts from the EDW to analyze offline in R or other local analytical tools. Many times these tools are limited to in-memory analytics with desktops analyzing samples of data, rather than the entire population of a data set. Because these analyses are based on data extracts, they live in a separate location and the results of the analysis – and any insights on the quality of the data or anomalies, rarely are fed back into the main EDW repository. Lastly, because data slowly accumulates in the EDW due to the rigorous validation and data structuring process, data is slow to move into the EDW and the schema is slow to change. EDWs may have been originally designed for a specific purpose and set of business needs, but over time evolves to house more and more data and enables business intelligence and the creation of OLAP cubes for analysis and reporting. The EDWs provide limited means to accomplish these goals, achieving the objective of reporting, and sometimes the creation of dashboards, but generally limiting the ability of analysts to iterate on the data in an separate environment from the production environment where they can conduct in-depth analytics, or perform analysis on unstructured data.

Big data: tendências e oportunidades - Palestrante: Ana Oliveira

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Big data: tendências e oportunidades - Palestrante: Ana Oliveira

Similar to Big data: tendências e oportunidades - Palestrante: Ana Oliveira (20)

More from Rio Info

More from Rio Info (20)

Recently uploaded

Recently uploaded (20)

Big data: tendências e oportunidades - Palestrante: Ana Oliveira

Editor's Notes