IBM's InfoSphere software helps organizations successfully leverage big data by providing an understanding of their data. It addresses the challenges of big data's four V's (volume, variety, velocity, and veracity) by automating data integration and governance. This helps boost confidence in big data by establishing standard terminology, tracing data lineage, and separating useful "good" data from unnecessary "bad" data. As a result, organizations can more accurately analyze big data and act on the insights with confidence.
Client approaches to successfully navigate through the big data stormIBM Analytics
Hadoop is not a platform for data integration: As a result, some organizations turn to hand coding for integration – or end up deploying solutions that aren’t fully scalable. Review this Slideshare to learn about IBM client best practices for Big Data Integration success.
Apache Spark is a rapidly growing and highly popular open source project that was Initially developed at the University of California, Berkeley’s AMPLab, starting in 2009. Spark is a next-generation, cluster-computing, runtime processing environment and development framework for in-memory advanced analytics. It continues to deepen open source platforms in the big data arena by leveraging and extending Apache Hadoop framework investments into machine learning, streaming data, SQL and graph analytics use cases. This slideshow presentation provides a crisp, ten-point summary of what every data analytics professional should know about Spark.
Hungry for more information on Spark? Learn more about Spark today at BigDataUniversity.com
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeIBM Analytics
BigInsights BigIntegrate and BigQuality offer a cost-effective opportunity to fully leverage the scale and promise of Hadoop. Here are 10 ways BigIntegrate and BigQuality are making it easier for organizations to harness the power of their entire data ecosystems. Learn more at ibm.co/datagovernance
Semantic Data Management: a research area of the chair for technologies and management of digital transformation from the university of Wuppertal, Germany.
For more information, see here: https://www.tmdt.uni-wuppertal.de/de
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
Companies collect more data but struggle with how to glean the best insights. Use of Machine Learning also needs power data integration.
In this presentation, Janet Jaiswal, SnapLogic's VP of product marketing, reviews key strategies and technologies to deliver intelligent data via self-service ML models.
To learn more, visit https://www.snaplogic.com
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformArne Roßmann
In order to turn data into opportunities, you need to build a modern data analytics platform. But because literally everything changes so fast, built-in flexibility is paramount.
This presentation covers:
- how to leverage all your data to generate insights
- the capabilities needed to build a flexible platform
- how to incorporate sustainability requirement
Client approaches to successfully navigate through the big data stormIBM Analytics
Hadoop is not a platform for data integration: As a result, some organizations turn to hand coding for integration – or end up deploying solutions that aren’t fully scalable. Review this Slideshare to learn about IBM client best practices for Big Data Integration success.
Apache Spark is a rapidly growing and highly popular open source project that was Initially developed at the University of California, Berkeley’s AMPLab, starting in 2009. Spark is a next-generation, cluster-computing, runtime processing environment and development framework for in-memory advanced analytics. It continues to deepen open source platforms in the big data arena by leveraging and extending Apache Hadoop framework investments into machine learning, streaming data, SQL and graph analytics use cases. This slideshow presentation provides a crisp, ten-point summary of what every data analytics professional should know about Spark.
Hungry for more information on Spark? Learn more about Spark today at BigDataUniversity.com
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeIBM Analytics
BigInsights BigIntegrate and BigQuality offer a cost-effective opportunity to fully leverage the scale and promise of Hadoop. Here are 10 ways BigIntegrate and BigQuality are making it easier for organizations to harness the power of their entire data ecosystems. Learn more at ibm.co/datagovernance
Semantic Data Management: a research area of the chair for technologies and management of digital transformation from the university of Wuppertal, Germany.
For more information, see here: https://www.tmdt.uni-wuppertal.de/de
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
Companies collect more data but struggle with how to glean the best insights. Use of Machine Learning also needs power data integration.
In this presentation, Janet Jaiswal, SnapLogic's VP of product marketing, reviews key strategies and technologies to deliver intelligent data via self-service ML models.
To learn more, visit https://www.snaplogic.com
Driven by data - Why we need a Modern Enterprise Data Analytics PlatformArne Roßmann
In order to turn data into opportunities, you need to build a modern data analytics platform. But because literally everything changes so fast, built-in flexibility is paramount.
This presentation covers:
- how to leverage all your data to generate insights
- the capabilities needed to build a flexible platform
- how to incorporate sustainability requirement
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...Data Con LA
Data Con LA 2020
Description
It’s no secret that the roots of Data Science date back to the 1960’s and were first mainstreamed in the 1990’s with the emergence of Data Mining. This occurred when commercially affordable computers started offering the horsepower and storage necessary to perform advanced statistics to scale.
However, the words “to scale” have evolved over time. The leap to “Big Data” is only one serial aspect of growth. Beyond the typical 1-off studies that catalyzed the field of Data Mining, Data Science now fulfills enterprise and multi-enterprise use cases spanning much broader and deeper data sets and integrations. For example, AI and Machine Learning frameworks can interoperate with a variety of other systems to drive alerting, feedback loops, predictive frameworks, prescriptive engines, continual learning, and more. The deployment of AI/ML processes themselves often involves integration with contemporary DevOps tools.
Now segue to SEAL – the Scalable Enterprise Analytic Lifecycle. In this presentation, you’ll learn how to cover the major bases of a modern Data Science projects – and Citizen Data Science as well – from conception, learning, and evaluation through integration, implementation, monitoring, and continual improvement. And as the name implies, your deployments will be performant and scale as expected in today’s environments.
Speaker
Jeff Bertman, CTO, Dfuse Technologies
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
In this webinar, learn from industry analyst and big data thought leader Mark Madsen about the future of big data and importance of the new Enterprise Data Lake reference architecture.
This webinar also covers what’s important when building a modern, multi-use data infrastructure, the difference between a Hadoop application and a Data Lake infrastructure, and an enterprise data lake reference architecture to get you started.
To learn more, visit: www.snaplogic.com/big-data
Even as enterprise IT shops are deploying private clouds to increase agility and reduce costs, they are also increasing the number of workloads being run in public clouds in order to meet the dynamic needs of the business. This means that the role of “cloud broker” is now part of the CIO’s strategic posture as a partner with the business.
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
There’s no denying that Cloud has evolved from being an outlying market disruptor to a mainstream method for delivering IT applications and services. In fact, it’s not uncommon to find that Enterprises use the services of more than one cloud at the same time. However, while a multi-cloud strategy offers many benefits, it also increases data management complexity and consequently reduces data availability. This webinar defines the meaning of DataOps and why it’s a crucial component for every multi-cloud approach.
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses on structured data but they are not designed to handle unstructured data.
For these systems Big Data brings big problems because the data that flows in may be either structured or unstructured. That makes them hugely limited when it comes to delivering Big Data benefits.
The way forward is a complete rethink of the way we use BI - in terms of how the data is ingested, stored and analyzed.
More information: http://www.capgemini.com/big-data-analytics/pivotal
At Postgres Vision 2018, Lauren Nelson, Principal Analyst, Forrester, provided a look into the practical considerations that are influencing modern cloud strategies, including existing skill sets and technology limitations, the assortment of current and future cloud workloads, and the economics and realities of today's technology options.
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
Big data isn't really about new technology. Sure that's a huge part of it, but the bigger change is in seeing a new way to do data and analytics. The shift in mindset for developers, managers, and business partners is a significant one and requires an intentional process for implementing that change.
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Vasu S
Find out how Qubole helped Spotad, Inc's mobile advertising platform, save 50 percent in its operating costs almost instantly after their migration.
https://www.qubole.com/resources/case-study/spotad
Across all industries, businesses are adapting and saving time with how they are using and managing data today.
Learn how your business can Integrate NetApp storage platforms with healthcare data solutions: http://www.netapp.com/us/solutions/industry/healthcare/
In the past, many organizations invested heavily in the development and growth of enterprise data warehouses (EDW). Today, the EDW is often at or over capacity. As EDWs hit their limits, many forward-looking organizations are shifting data warehouse processing functions to Hadoop for its scalability and economic benefits.
With a robust EDW offload solution, an organization can accelerate ETL processing, work easily with a wide range of new data sources and formats, and make better use of existing EDW investments.
Thought leaders from Dell EMC, Cloudera and Syncsort discuss how best to begin a big data journey by taking control of all data, controlling costs, and identifying the first use case, so you can move forward with confidence to transform your business.
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
Rip and replace isn't a good approach to IT change. When looking at Hadoop, MPP, in-memory and predictive analytics the challenge is making them co-exist with current solutions.
Learn how Capgemini’s Pivotal CoE utilizes Cloud Foundry and PivotalOne to help businesses adopt new technologies without losing the value of current investments.
Presented by Michael Wood of Pivotal and Steve Jones, Global Director, Strategy, Big Data and Analytics, Capgemini, at EMC World 2014.
As open source databases become the enterprise standard, making all data available and accessible for AI has become an even bigger challenge. In the presentation delivered at Postgres Vision 2018, Rob Thomas, General Manager of IBM Analytics, provided answers for how companies can prepare their Information Architecture for AI, leverage containers and multi-cloud for innovation, and deliver a data and analytics strategy at scale.
In the domain of data science, solving problems and answering questions through data analysis is standard practice. Data scientists experiment continuously by constructing models to predict outcomes or discover underlying patterns, with the goal of gaining new insights. Organizations can then use these insights to strengthen customer relationships, improve service delivery and drive new opportunities. To help guide the processes and activities within a given domain, data scientists and engineers need a foundational methodology that provides a framework for how to proceed with whichever methods or tools they will use to obtain answers and deliver results. In this presentation, we will share data science tips for data engineers.
Join the Data Science Experience: http://ibm.co/data-science
This presentation was delivered at the BI SIG in Palo Alto. It provides an overview of the market shift away from on-premise solutions to on-demand in the business intelligence industry.
You are not Facebook or Google? Why you should still care about Big Data and ...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data.
This session goes beyond the well-known examples of huge companies such as Facebook or Google with millions of users. Instead, this session explains the "big" paradigm and technology shift for your company. See several use cases how big data enables small / medium-sized companies to gain insight into new business opportunities (and threats) and how big data stands to transform much of what the modern enterprise is today.
Learn about solving the unique challenges of big data without an own research lab or several big data experts in your company. Learn how to implement the relevant use cases for your company with low costs and efforts by using open source frameworks, which simplify working with big data a lot.
This whitepaper from IBM shows how your organisation can implement a Big Data Analytics solution effectively and leverage insights that can transform your business.
IBM Solutions Connect 2013 keynote presentation talking about emerging challenges in today's business world and various technologies that can help decision makers address these challenges and drive their business forward. Smart ideas. Smarter actions.
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...Data Con LA
Data Con LA 2020
Description
It’s no secret that the roots of Data Science date back to the 1960’s and were first mainstreamed in the 1990’s with the emergence of Data Mining. This occurred when commercially affordable computers started offering the horsepower and storage necessary to perform advanced statistics to scale.
However, the words “to scale” have evolved over time. The leap to “Big Data” is only one serial aspect of growth. Beyond the typical 1-off studies that catalyzed the field of Data Mining, Data Science now fulfills enterprise and multi-enterprise use cases spanning much broader and deeper data sets and integrations. For example, AI and Machine Learning frameworks can interoperate with a variety of other systems to drive alerting, feedback loops, predictive frameworks, prescriptive engines, continual learning, and more. The deployment of AI/ML processes themselves often involves integration with contemporary DevOps tools.
Now segue to SEAL – the Scalable Enterprise Analytic Lifecycle. In this presentation, you’ll learn how to cover the major bases of a modern Data Science projects – and Citizen Data Science as well – from conception, learning, and evaluation through integration, implementation, monitoring, and continual improvement. And as the name implies, your deployments will be performant and scale as expected in today’s environments.
Speaker
Jeff Bertman, CTO, Dfuse Technologies
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
In this webinar, learn from industry analyst and big data thought leader Mark Madsen about the future of big data and importance of the new Enterprise Data Lake reference architecture.
This webinar also covers what’s important when building a modern, multi-use data infrastructure, the difference between a Hadoop application and a Data Lake infrastructure, and an enterprise data lake reference architecture to get you started.
To learn more, visit: www.snaplogic.com/big-data
Even as enterprise IT shops are deploying private clouds to increase agility and reduce costs, they are also increasing the number of workloads being run in public clouds in order to meet the dynamic needs of the business. This means that the role of “cloud broker” is now part of the CIO’s strategic posture as a partner with the business.
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
There’s no denying that Cloud has evolved from being an outlying market disruptor to a mainstream method for delivering IT applications and services. In fact, it’s not uncommon to find that Enterprises use the services of more than one cloud at the same time. However, while a multi-cloud strategy offers many benefits, it also increases data management complexity and consequently reduces data availability. This webinar defines the meaning of DataOps and why it’s a crucial component for every multi-cloud approach.
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses on structured data but they are not designed to handle unstructured data.
For these systems Big Data brings big problems because the data that flows in may be either structured or unstructured. That makes them hugely limited when it comes to delivering Big Data benefits.
The way forward is a complete rethink of the way we use BI - in terms of how the data is ingested, stored and analyzed.
More information: http://www.capgemini.com/big-data-analytics/pivotal
At Postgres Vision 2018, Lauren Nelson, Principal Analyst, Forrester, provided a look into the practical considerations that are influencing modern cloud strategies, including existing skill sets and technology limitations, the assortment of current and future cloud workloads, and the economics and realities of today's technology options.
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
Big data isn't really about new technology. Sure that's a huge part of it, but the bigger change is in seeing a new way to do data and analytics. The shift in mindset for developers, managers, and business partners is a significant one and requires an intentional process for implementing that change.
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Vasu S
Find out how Qubole helped Spotad, Inc's mobile advertising platform, save 50 percent in its operating costs almost instantly after their migration.
https://www.qubole.com/resources/case-study/spotad
Across all industries, businesses are adapting and saving time with how they are using and managing data today.
Learn how your business can Integrate NetApp storage platforms with healthcare data solutions: http://www.netapp.com/us/solutions/industry/healthcare/
In the past, many organizations invested heavily in the development and growth of enterprise data warehouses (EDW). Today, the EDW is often at or over capacity. As EDWs hit their limits, many forward-looking organizations are shifting data warehouse processing functions to Hadoop for its scalability and economic benefits.
With a robust EDW offload solution, an organization can accelerate ETL processing, work easily with a wide range of new data sources and formats, and make better use of existing EDW investments.
Thought leaders from Dell EMC, Cloudera and Syncsort discuss how best to begin a big data journey by taking control of all data, controlling costs, and identifying the first use case, so you can move forward with confidence to transform your business.
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
Rip and replace isn't a good approach to IT change. When looking at Hadoop, MPP, in-memory and predictive analytics the challenge is making them co-exist with current solutions.
Learn how Capgemini’s Pivotal CoE utilizes Cloud Foundry and PivotalOne to help businesses adopt new technologies without losing the value of current investments.
Presented by Michael Wood of Pivotal and Steve Jones, Global Director, Strategy, Big Data and Analytics, Capgemini, at EMC World 2014.
As open source databases become the enterprise standard, making all data available and accessible for AI has become an even bigger challenge. In the presentation delivered at Postgres Vision 2018, Rob Thomas, General Manager of IBM Analytics, provided answers for how companies can prepare their Information Architecture for AI, leverage containers and multi-cloud for innovation, and deliver a data and analytics strategy at scale.
In the domain of data science, solving problems and answering questions through data analysis is standard practice. Data scientists experiment continuously by constructing models to predict outcomes or discover underlying patterns, with the goal of gaining new insights. Organizations can then use these insights to strengthen customer relationships, improve service delivery and drive new opportunities. To help guide the processes and activities within a given domain, data scientists and engineers need a foundational methodology that provides a framework for how to proceed with whichever methods or tools they will use to obtain answers and deliver results. In this presentation, we will share data science tips for data engineers.
Join the Data Science Experience: http://ibm.co/data-science
This presentation was delivered at the BI SIG in Palo Alto. It provides an overview of the market shift away from on-premise solutions to on-demand in the business intelligence industry.
You are not Facebook or Google? Why you should still care about Big Data and ...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data.
This session goes beyond the well-known examples of huge companies such as Facebook or Google with millions of users. Instead, this session explains the "big" paradigm and technology shift for your company. See several use cases how big data enables small / medium-sized companies to gain insight into new business opportunities (and threats) and how big data stands to transform much of what the modern enterprise is today.
Learn about solving the unique challenges of big data without an own research lab or several big data experts in your company. Learn how to implement the relevant use cases for your company with low costs and efforts by using open source frameworks, which simplify working with big data a lot.
This whitepaper from IBM shows how your organisation can implement a Big Data Analytics solution effectively and leverage insights that can transform your business.
IBM Solutions Connect 2013 keynote presentation talking about emerging challenges in today's business world and various technologies that can help decision makers address these challenges and drive their business forward. Smart ideas. Smarter actions.
The whitepapaer includes the following:
1. Understand mobile computing business
patterns and associated security risks.
2. Secure and monitor mobile device, data,
enterprise access, and applications.
3. Realize mobile enterprise security
strategy using IBM Security Solutions
The following report from IBM explores the latest Security trends—from malware delivery to mobile device risks—based on 2013 year-end data and ongoing research.
Big Data Trends and Challenges Report - WhitepaperVasu S
In this whitepaper read How companies address common big data trends & challenges to gain greater value from their data.
https://www.qubole.com/resources/report/big-data-trends-and-challenges-report
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
Project 3 – Hollywood and IT
· Find 10 incidents of Hollywood portraying IT security incorrectly
· You can use movies or TV episodes
· Write 2-5 paragraphs for each incident. Use supporting citations for each part.
· What has Hollywood portrayed wrong? Describe the scene and what is being shown. Make sure to state whether it is partially wrong or totally fictitious.
· How would you protect/secure against what they show (answers might include install firewall, load Antivirus etc.)
· Use APA formatting for your sources on everything.
· Make sure to put your name on assignment.
Big Data and Social Media
Colgate Palmolive
Agenda Of socail media use
Buisness intellegence and Social media concenpts
Intellegent organization
Data Anaylysis and Data trustworthiness
Conclusion
Buisness intellegence and Social media concenpts
No-Hassle Documentation
Gain Trusted Followers
Spy on Competition
Learn Customer Demographics
Research and Analyze Events
Advertise More Accurately
Intellegent organization
They consistently use (big) data proactively
They know exactly where they want to go: all-round vision
They continuously discuss business matters: alignment
They talk to each other regarding positive and negative performance
They know their customers through and through
They think and work in an agile way
Data Anaylysis and Data trustworthiness
Data completeness and accuracy
Data credibility
Data consistency
Data processing and algorithms
Data Validity
Conclusion
How Colgate benefit from Big Data and Social Media
Social media increases sales and customers
Big data shows popular trends and popular companies
All around they are both beneficial
Big Data can find trends that can benefit you greatly
Criteria
Title Page:
Name, Contact info, title of Presentation
Slide 1
Adenda : Topic you going to cover in order
Slide 2
Discuss how big data, social media concepts and knowledge to successfully create business intellegence (Support your bullets points with data, analysis, charts)
Slide 3
Describe how big data can be used to build an intelligent organization
Slide 4
Discuss the importance of data source trustworthiness and data analysis
Slide 5
Conclusion
Slide 6
Big Data And Business Intelligence
Business Value With Big Data
For business to survive in a competitive environment, organizational change requires improved governance, sponsorship, processes, and controls, in addition to new skill sets and technology all work in harmony to deliver the benefits of big data. See Fig. 13.2
Data science has taken the business world by storm. Every field of study and area of business has been affected as companies realize the value of the incredible quantities of data being generated. But to extract value from those data, one needs to be trained in the proper data science skills. The R programming language has become the de fac to programming language for data science. Its flexibility, power, sophistication, and expressiveness have ma ...
Building an Effective Data Management StrategyHarley Capewell
In June 2013, Experian hosted a Data
Management Summit in London, with over
100 delegates from the public, private and
third sectors. Speakers from Experian
and across the data industry explored the
challenges of developing and implementing
data quality strategies - and how to
overcome them. Read on for more information.
Identify and analyze the greatest insights from big dataTheInnovantes
Identify and analyze the greatest insights from big data by leveraging technologies that can effectively handle the growing volume, velocity, and variety of data.
My keynote speech at the ISACA IIA Belgium software watch day in October 2014 in Brussels on the value of big data and data analytics for auditors and other assurance professionals
Big Data is Here for Financial Services White PaperExperian
Conquering Big Data Challenges
Financial institutions have invested in Big Data for many years, and new advances in technology infrastructure have opened the door for leveraging data in ways that can make an even greater impact on your business.
Learn how Big Data challenges are easier to overcome and how to find opportunities in your existing data and scale for the future.
Data is not consistent, sometimes searches or general interest in certain topics, say social media or other types of data experienced peaks and valleys. Data analysis techniques allow the data scientist to mine this type of unstable data and still draw meaningful conclusions from it.
Analytics facilitates a cohesive banding together of the organization for execution-oriented
organizational alignment, enterprise agility and better performance enabling action. Smarter
decisions result leading to superior performance.
Organizations who are looking to work faster and with greater agility often look to a private cloud as a solution. Not only can a private cloud improve data security, but it can also make better use of your existing IT resources. Unless you’re using Platform as a Service (PaaS), you’re not getting the full value out of your private cloud implementation. IBM’s Private Modular Cloud removes the bottlenecks that result from manual setups of middleware provisioning.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Understanding Big Data so you can act with confidence
1. IBM Software
Understanding big data
so you can act with confidence
More data, more problems? Not if you have an agile, automated
information integration and governance program in place
2. Understanding big data so you can act with confidence
1 2 3 4 5Introduction The four Vs:
Challenges
inherent to
big data
Big data, big
opportunities
Understanding:
The key to
successfully
leveraging
big data
Why InfoSphere?
3. Understanding big data so you can act with confidence
3
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Introduction
Business leaders are eager to harness
the power of big data. However, as the
opportunity increases, it becomes
exponentially more difficult to ensure that
source information is trustworthy and
protected. If this trustworthiness issue is
not addressed directly, end users may lose
confidence in the insights generated from
their data—which can result in a failure to
act on opportunities or against threats.
Take, for example, the case of a large
multinational corporation whose CFO
asked a very simple question every month:
How did sales this month compare to
the same month last year? The answer
turned out to be quite complicated.
Sales were calculated and reported
in various systems: material planning,
financial, rent and royalty, real estate and
so forth. Each system reported a different
number because each system defined
a “sale” slightly differently, and therefore
extracted dissimilar data from the millions
of daily transactions generated around
the world. So what was the truth? No
one actually knew.
1 Introduction
4. Understanding big data so you can act with confidence
4
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
The sheer volume and complexity of big
data means that the traditional method
of discovering, governing and correcting
information using manual stewardship
may not apply. Information integration and
governance must be implemented within
big data applications, providing appropriate
governance and rapid integration from the
start. By automating information integration
and governance and employing it at the
point of data creation, organizations can
boost confidence in their big data.
A solid information integration and
governance program should include
automated discovery, profiling and
understanding of diverse datasets to
provide context and enable employees
to make informed decisions. It must be
agile to accommodate a wide variety of
data and seamlessly integrate with various
technologies, from data marts to Apache
Hadoop systems. And it must automatically
discover, protect and monitor sensitive
information as part of big data applications.
IBM® InfoSphere® is designed to do all
of these things by evolving information
integration and governance to meet the
challenges presented by big data.
1 Introduction
5. Understanding big data so you can act with confidence
5
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Every second of every day, businesses
generate more data. Researchers at
IDC estimate that by the end of 2020,
the amount of data created and copied
annually will exceed 44 zettabytes, or
44 trillion gigabytes. That’s a 40 percent
annual data growth rate every year
through the end of the decade.1
All of
that big data represents a big opportunity
for organizations.
Companies recognize that their big data
contains valuable information. They are
eager to analyze it to obtain actionable
insights that could help them deepen
customer relationships, prevent threats
and fraud, optimize operations and identify
new revenue opportunities.
Unfortunately, extracting valuable
information from big data isn’t as easy
as it sounds. Big data amplifies any
existing problems in your infrastructure,
processes or even the data itself. To make
the most of the information in their systems,
companies must successfully deal with the
The four Vs: Challenges inherent to big data
“four Vs” that distinguish big data:
variety, volume, velocity and veracity.
The first three—variety, volume and velocity—
define big data; when you have a large
volume of data coming in from a wide
variety of applications and formats and it’s
moving and changing at a rapid velocity,
that’s when you know you have big data.
The fourth V, veracity, is a measure of the
accuracy and trustworthiness of your data.
Veracity is a goal—one that the variety,
volume and velocity of big data make
harder to achieve.
Your company can take advantage of the
opportunities available in big data only when
you have processes and solutions that can
handle all four Vs.
2 The four Vs: Challenges
inherent to big data
According to IBM,“Every day, we create 2.5 quintillion bytes of data—so
much that 90 percent of the data in the world today has been created in the
last two years alone.”2
6. Understanding big data so you can act with confidence
6
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Volume
Increasing data volume
is at the heart of the
big data challenge.
Large data volumes can
cause many obvious
technical problems,
such as excessive batch processing times,
bottlenecks and so on.
However, the problem goes deeper than
that. What data should an organization care
about? How will it find that data? How
will it know how to leverage that data for
business use? All of these questions point
to a fundamental need to understand
the data—and manual techniques of
examining data as it comes in are simply
unequal to the task of working with big data.
Variety
Enterprise data
comes from many
sources. Customers
and salespeople
generate data every
time they make a sale.
Accountants produce data every time they
create a spreadsheet. Marketers create
data every time they write a report.
In addition, machine data—from sources
such as meter readings and IT system
logs—can increase data volumes
enormously. Organizations also get
constant streams of data from external
sources such as suppliers, consultants,
research firms and news feeds.
All of this data is stored in a dizzying array
of formats. Some gets stored in structured
formats in databases or enterprise resource
planning (ERP) applications, but much of it
is in documents, email messages or even
photos and videos—unstructured formats
that are challenging to manage.
And data doesn’t rest once it is in storage.
It must be moved from application to
application and from system to system so
managers and executives can interpret the
data and come to meaningful conclusions.
2 The four Vs: Challenges
inherent to big data
What data should an organization
care about? How will it find
that data?
7. Understanding big data so you can act with confidence
7
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Veracity
The first three Vs—
variety, volume and
velocity—define big
data. In contrast, the
fourth V—veracity—
is a desirable aspect
of data that organizations want, but
don’t always get.
Veracity is directly related to the
trustworthiness of an organization’s data.
The first step toward building trust into data
is to understand which data is needed and
for what business purpose. Can knowledge
workers be sure that the data is accurate,
current and complete—and therefore be
confident in any decisions that are based
on that data?
2 The four Vs: Challenges
inherent to big data
Velocity
In almost every industry,
data is coming in
faster than ever. In
activities such as
stock trades, fraud
detection or patient
health monitoring, seconds—sometimes
microseconds—can be critically important.
Organizations must be able to understand
data as it is streaming in. In most cases,
getting that all-important information
requires moving data quickly from one
application or repository to another,
where it can be processed and analyzed.
Unfortunately, many data integration
solutions lack the high performance that
big data projects require. There is not
enough time to collect the data and
process it in batches—particularly if
a real-time or near–real-time response
is warranted.
In most cases, getting that all-important information requires moving
data quickly from one application or repository to another, where it can be
processed and analyzed.
8. Understanding big data so you can act with confidence
8
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Big data presents significant opportunities
for increased growth and profitability—and
forward-thinking organizations are already
realizing some of these benefits:
• A retailer actively uses social media
data to analyze customer sentiment
and improve loyalty, helping to drive
revenue growth
• A transportation company uses
machine data to optimize logistics,
thereby reducing shipping costs
• A telecommunications company
slashed developer costs by over
50 percent by articulating clear terms
and policies for the data it needed
• A manufacturing company enhanced
the trustworthiness of its reports after
it discovered and eliminated 37 unique
definitions of “customer” across its
enterprise, and agreed to a single,
standard definition
Capturing these sorts of benefits from big
data requires knowing what the business
needs and being able to find key items
within the larger mass of big data.
Begin by articulating the goals of the
business. Determine the analytics and
reports necessary to support those
business objectives. Now you can make
decisions about the data needed to inform
the reports.
Big data, big opportunities
3 Big data, big opportunities
Standard terminology is a critical piece of this
process. For example, there should be no
misunderstanding about what constitutes a
“sale” or a “customer.” With standard terms,
organizations can create standard policies,
such as defining the data that comprises a
complete customer record.
Armed with objectives and standard terms
and policies, teams can create data
processing flows that automate the extraction
of relevant data from the mass of big data.
Importantly, this understanding also includes
knowing at all times where data is coming
from, how it is being manipulated and where
it is going. This chain of understanding builds
trust into information—and promotes
confident decision making.
9. Understanding big data so you can act with confidence
9
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
You can’t govern what you don’t
understand
It’s an unfortunately common refrain across
business and IT users when it comes to
discussing data: “That’s not what I meant!”
Organizations, departments and sectors often
use different terms or labels for the same
information, leading to confusion over details
such as what constitutes a customer (is it a
household or an individual?).
In one case, a large manufacturer had 37 different
definitions of the term “Employee ID” across
multiple divisions. Using spreadsheets to record
and compare the representations, the company
discovered at least 15 different data types
and formats, with different characteristics and
usages—a result of multiple legacy systems,
acquired companies with their own systems
and homegrown applications. This meant that
IT spent extra time and used undocumented
knowledge to determine which version to use
in which reports. Developers and analysts then
spent more time determining which data to
use in which tests, examples and tasks. And
business users were hampered by data quality
issues coming from duplicated or missed data.
A large European telecom company faced
similar issues when it discovered not just
terminology problems but also an inability to
trace data back to its source. Using InfoSphere
Information Governance Catalog, the company
standardized business terms and policies,
which helped increase confidence in reports.
Next, it redesigned integration processes and
built a clear blueprint, reducing time required
to write actual integration jobs by 50 percent.
Finally, it used the InfoSphere tool to establish a
clear metadata lineage—from source to target,
including all transformations—that uncovered
redundant data. Consolidating and archiving
that data led to a 75 percent reduction in
storage space.
3 Big data, big opportunities
10. Understanding big data so you can act with confidence
10
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?4 Understanding: The key to
successfully leveraging big data
Understanding: The key to successfully leveraging big data
Successful businesses depend on
trusted information. IBM InfoSphere
Information Governance Catalog helps
organizations create an understanding of
big data that enables it to be converted
into trusted information. It allows business
users to play an active role in information-
centric projects and to collaborate with
technical teams—all without the need
for technical training. The end result:
an organization with a consistent
understanding of information, what
it means, how it is used and why it
can be trusted. Decisions are more
accurate and business opportunities
can be readily captured.
Defining “truth”
In a philosophy class, students might
consider the idea that truth is ultimately
unknowable or that truth is constantly
changing. Most organizations do not have
that luxury. Enterprises must have agreed-
upon definitions for important terms so they
can monitor and act on key metrics.
Consider a global financial institution that
has grown rapidly through mergers and
acquisitions. Because each of the
company’s various lines of business grew
independently, each has its own unique
processes, IT systems and definitions for
important terms. It isn’t uncommon for
the CIO, CFO and CMO to use different
definitions, which can lead to confusion
when it comes to analytics and reports.
InfoSphere Information Governance Catalog
includes a business glossary that enables
business and IT to create and agree on
definitions, rules and policies. InfoSphere
Information Governance Catalog also
includes data modeling capabilities that
allow data architects to determine where
each piece of data will come from and
where it will go. This capability helps ensure
that everyone involved in the big data
project knows exactly what key metrics
mean and where the data should originate—
establishing the “truth” as it relates to their
business data.
11. Understanding big data so you can act with confidence
11
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Defining “trustworthy”
Unfortunately, it isn’t enough just to
establish policies and definitions and hope
that people will follow them. To be truly
confident that their data is trustworthy,
organizations must be able to trace its
path through their systems, see where
it came from and understand how it
was manipulated.
InfoSphere Information Governance Catalog
provides this level of transparency. It
includes data lineage capabilities that
enable users to track data back to its
original source and to see every calculation
performed on it along the way, increasing
the data’s accuracy and trustworthiness.
Defining “good”
The “goodness” of data depends upon
its usefulness in analysis, reporting and
decision making. Therefore, developing
an understanding of big data requires
companies to separate useful “good” data
from unhelpful “bad” data. In other words,
organizations must be able to extract only
those bits of data necessary to support a
particular business objective, and set aside
the rest. By filtering data in this way,
unnecessary data is kept out of data
warehouses and Hadoop file systems,
creating a more efficient processing
4 Understanding: The key to
successfully leveraging big data
environment and lessening hardware
and software costs.
The data quality and data governance
capabilities of InfoSphere Information
Governance Catalog help companies know
for certain that their data is good,
trustworthy and true. That certainty allows
them to be more confident in the results of
their analytics activities and to take action
quickly and assertively. When they have a
solid foundation of good data, business
leaders can build a more flexible and agile
company that will be better able to identify
and capitalize on business opportunities.
Developing an understanding of big data requires companies to separate
useful “good” data from unhelpful “bad” data.
12. Understanding big data so you can act with confidence
12
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
InfoSphere Information Governance Catalog:
The foundation for understanding big data
The first step toward proper information
governance involves establishing correct
data definitions that the entire organization
can use to better understand information.
While a full understanding of business context
and meaning resolves ambiguity and leads
to more accurate decisions, users often require
more detail behind their data.
InfoSphere Information Governance Catalog
offers a solution to this problem. It enables
organizations to create and then link business
terms to technical artifacts.
Here are some key features:
• Web-based management of business terms,
definitions and categories enables the creation
of an authoritative and common business
vocabulary for technical and business users
• Integration with InfoSphere Information
Server metadata helps ensure that technical and
business information is always connected
and consistent
• Security permissions help protect sensitive
business terms and definitions from
unauthorized users
• Customizable features and attributes enable
business users to define unique parameters
for their specific organizational and business
environment
4 Understanding: The key to
successfully leveraging big data
13. Understanding big data so you can act with confidence
13
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
• Collaborative environment and feedback
mechanisms encourage organic growth and
allow different glossary users to jointly develop
or improve the glossary content
• Powerful glossary import and export
capabilities enable administrators to combine
existing fragmented and home-grown glossaries
into a single enterprise glossary for use by a
wider business audience
• Data stewardship empowers ownership of
business-term integrity and its governance
• Terms can be linked in an easy-to-use
web interface to create policies that govern
information objects
• Metadata lineage is captured and maintained
so that information contained in reports and
applications can be easily traced back to original
sources for validation (a critical step in meeting
compliance requirements for regulations such
as Basel II and the Sarbanes-Oxley Act)
• Globalization and translation support
for simplified Chinese, traditional Chinese,
Japanese, Korean, French, German, Italian,
Spanish and Brazilian Portuguese allows
customers around the world to use InfoSphere
Information Governance Catalog in their
native language
4 Understanding: The key to
successfully leveraging big data
To make matters even simpler for businesses
embarking on a data governance and business
glossary solution, IBM provides packaged
business terms and definitions for six major
industries: banking, financial services, retail,
telecommunications, healthcare and insurance.
These content offerings help accelerate the
implementation and deployment of your business
glossary by immediately providing rich, industry-
specific business terms and definitions.
14. Understanding big data so you can act with confidence
14
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?
Why InfoSphere?
5 Why InfoSphere?
The IBM InfoSphere platform for information
integration and governance combines the
capabilities necessary for creating trusted
information from traditional and new
sources of big data. It provides an
enterprise-class foundation for information-
intensive projects, offering the performance,
scalability, reliability and acceleration
needed to deliver trusted information faster.
These capabilities include:
• Metadata, business glossary and
policy management: Defining both
metadata and governance policies with a
common component used by all integration
and governance engines is a critical task.
InfoSphere Information Governance Catalog
contains capabilities for metadata
management, governance policy definition
and management, and governance project
blueprint design, as well as a business
glossary of terms and definitions.
• Data integration: The InfoSphere platform
offers multiple integration capabilities for
batch data transformation and movement
(InfoSphere Information Server), real-time
replication (InfoSphere Data Replication)
and data federation (InfoSphere
Federation Server).
• Data quality: InfoSphere Information
Server for Data Quality has the ability to
parse, standardize, validate and match
enterprise data.
• Master data management (MDM):
InfoSphere MDM manages multiple data
domains, including customer, product,
account, location, reference data and
more. It handles any domain or style and
offers the flexibility to define custom
domains as required.
• Data lifecycle management: IBM
InfoSphere Optim™ manages the data
lifecycle from test data creation through
the retirement and archiving of data from
enterprise systems.
15. Understanding big data so you can act with confidence
15
1 Introduction 2 The four Vs: Challenges
inherent to big data
3 Big data, big opportunities 4 Understanding: The key to
successfully leveraging big data
5 Why InfoSphere?5 Why InfoSphere?
• Data security and privacy: IBM
InfoSphere Guardium® activity monitoring
solutions continuously monitor data
access and protect repositories to prevent
data breaches and support compliance.
The InfoSphere Optim Data Masking
solution masks data in applications to help
protect sensitive data.
As a critical element of IBM Watson™
Foundations, the IBM big data and
analytics platform, InfoSphere Information
Integration and Governance (IIG) provides
market-leading functionality to handle the
challenges of big data. InfoSphere provides
optimal scalability and performance for
massive data volumes, agile and right-
sized integration and governance for the
increasing velocity of data, and support
and protection for a wide variety of data
types and big data systems. InfoSphere
helps make big data and analytics projects
successful by delivering business users the
confidence to act on insight.