Big Data and NoSQL continue to make headlines everywhere. However, most of what has been written about these topics is focused on the hardware, services, and scale out. But what about a Big Data and NoSQL Strategy, one that supports your business strategy? Virtually every major organization thinking about these data platforms is faced with the challenge of figuring out the appropriate approach and the requirements. This presentation will provide guidance on how to think about and establish realistic Big Data management plans and expectations. We will introduce a framework for evaluating the various choices when it comes to implementing and succeeding with Big Data/NoSQL and show how to demonstrate a sample use case.
sumber daya manusia, pelatihan dan pengembangan sdm, pelatihan sdm, pelatihan karyawan, manajemen pelatihan, pelatihan manajemen, program pelatihan, pelatihan hrd, teori pengembangan sdm, pelatihan dan pengembangan sumber daya manusia, tugas manajemen sumber daya manusia, training sdm, tugas msdm, pelatihan sumber daya manusia, teori pengembangan sumber daya manusia, pelatihan pengembangan sdm, pelatihan manajemen sdm,
http://infokonsultasisdm.web.id/
sumber daya manusia, pelatihan dan pengembangan sdm, pelatihan sdm, pelatihan karyawan, manajemen pelatihan, pelatihan manajemen, program pelatihan, pelatihan hrd, teori pengembangan sdm, pelatihan dan pengembangan sumber daya manusia, tugas manajemen sumber daya manusia, training sdm, tugas msdm, pelatihan sumber daya manusia, teori pengembangan sumber daya manusia, pelatihan pengembangan sdm, pelatihan manajemen sdm,
http://infokonsultasisdm.web.id/
Materi kuliah Psikologi Industri di Prodi Teknik Industri dengan topik Riset dalam Psikologi Industri mencakup
- Pentingnya riset dalam Psikologi Industri
- Hal-hal yang perlu diperhatikan dalam Riset keilmuan Psikologi Industri
Distribusi hipergeometrik amat mirip penggunaannyaa dengan binomial . Perbedaannya
terletak pada cara pengambilan sampelnya. Untuk kasus binomial, diperlukan kebebasan
antara usaha . Akibatnya bila binomial diterapkan, misalnya pada sampling dari sejumlah
barang (sekotak kartu, sejumlah barang produksi), sampling harus dikerjakan dengan
pengambilan setiap barang setelah diamati. Sedangkan, distribusi hipergeometrik tidak
memerlukan kebebasan dan didasarkan pada sampling tanpa pengambilan. Penggunaan
distribusi hipergeometrik terdapat pada pengujian yang dilakukan terhadap barang yang diuji
mengakibatkan barang yang teruji tersebut menjadi rusak, jadi tidak dapat dikembalikan.
Contohnya pada pengujian elektronik, dan pengendalian mutu.
Many data professionals struggle with the ability to demonstrate tangible returns on data management investments. In a webinar that is designed to appeal to both business and IT attendees, your presenter will describe multiple types of value produced through data-centric development and management practices. One of our examples, the healthcare space, offers the unique opportunity to demonstrate additional types of return on investment or value outcomes, namely returns in the form of lives saved through increased rates of Bone Marrow Donor matches. In addition to metrics around increasing revenues or decreasing costs, i.e. investments that directly impact an organization’s financial position, these additional statistics of lives saved can be used to justify data management and quality initiatives.
The Data Management Maturity (DMM) model is a framework for the evaluation and assessment of an organization’s data management capabilities. The model allows an organization to evaluate its current state data management capabilities, discover gaps to remediate, and strengths to leverage. The assessment method reveals priorities, business needs, and a clear, rapid path for process improvements. This webinar will describe the DMM, its evolution, and illustrate its use as a roadmap guiding organizational data management improvements.
Materi kuliah Psikologi Industri di Prodi Teknik Industri dengan topik Riset dalam Psikologi Industri mencakup
- Pentingnya riset dalam Psikologi Industri
- Hal-hal yang perlu diperhatikan dalam Riset keilmuan Psikologi Industri
Distribusi hipergeometrik amat mirip penggunaannyaa dengan binomial . Perbedaannya
terletak pada cara pengambilan sampelnya. Untuk kasus binomial, diperlukan kebebasan
antara usaha . Akibatnya bila binomial diterapkan, misalnya pada sampling dari sejumlah
barang (sekotak kartu, sejumlah barang produksi), sampling harus dikerjakan dengan
pengambilan setiap barang setelah diamati. Sedangkan, distribusi hipergeometrik tidak
memerlukan kebebasan dan didasarkan pada sampling tanpa pengambilan. Penggunaan
distribusi hipergeometrik terdapat pada pengujian yang dilakukan terhadap barang yang diuji
mengakibatkan barang yang teruji tersebut menjadi rusak, jadi tidak dapat dikembalikan.
Contohnya pada pengujian elektronik, dan pengendalian mutu.
Many data professionals struggle with the ability to demonstrate tangible returns on data management investments. In a webinar that is designed to appeal to both business and IT attendees, your presenter will describe multiple types of value produced through data-centric development and management practices. One of our examples, the healthcare space, offers the unique opportunity to demonstrate additional types of return on investment or value outcomes, namely returns in the form of lives saved through increased rates of Bone Marrow Donor matches. In addition to metrics around increasing revenues or decreasing costs, i.e. investments that directly impact an organization’s financial position, these additional statistics of lives saved can be used to justify data management and quality initiatives.
The Data Management Maturity (DMM) model is a framework for the evaluation and assessment of an organization’s data management capabilities. The model allows an organization to evaluate its current state data management capabilities, discover gaps to remediate, and strengths to leverage. The assessment method reveals priorities, business needs, and a clear, rapid path for process improvements. This webinar will describe the DMM, its evolution, and illustrate its use as a roadmap guiding organizational data management improvements.
The data governance function exercises authority and control over the management of your mission critical assets and guides how all other data management functions are performed. When selling data governance to organizational management, it is useful to concentrate on the specifics that motivate the initiative. This means developing a specific vocabulary and set of narratives to facilitate understanding of your organizational business concepts. This webinar provides you with an understanding of what data governance functions are required and how they fit with other data management disciplines. Understanding these aspects is a necessary pre-requisite to eliminate the ambiguity that often surrounds initial discussions and implement effective data governance and stewardship programs that manage data in support of organizational strategy.
Find more of our Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Data architecture is foundational to an information-based operational environment. It is your data architecture that organizes your data assets so they can be leveraged in your business strategy to create real business value. Even though this is important, not all data architectures are used effectively. This webinar describes the use of data architecture as a basic analysis method. Various uses of data architecture to inform, clarify, understand, and resolve aspects of a variety of business problems will be demonstrated. As opposed to showing how to architect data, your presenter Dr. Peter Aiken will show how to use data architecting to solve business problems. The goal is for you to be able to envision a number of uses for data architectures that will raise the perceived utility of this analysis method in the eyes of the business.
Find out more: http://www.datablueprint.com/resource-center/webinar-schedule/
This presentation provides you with an understanding of the goals of reference and master data management (MDM), including establishing and implementing authoritative data sources, establishing and implementing more effective means of delivery data to various business processes, as well as increasing the quality of information used in organizational analytical functions (such as BI). You will understand the parallel importance of incorporating data quality engineering into the planning of reference and MDM.
Check out more of our Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Good systems development often depends on multiple data management disciplines that provide a solid foundation. One of these is metadata. While much of the discussion around metadata focuses on understanding metadata itself along with its associated technologies, this perspective often represents a typical tool-and-technology focus, which has not achieved significant results to date. A more relevant question when considering pockets of metadata is whether to include them in the scope of organizational metadata practices. By understanding what it means to include items in the scope of your metadata practices, you can begin to build systems that allow you to practice sophisticated ways to advance their data management and supported business initiatives. After a bit of practice in this manner you can position your organization to better exploit any and all metadata technologies in support of business strategy.
Find more data management webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Integrating data across systems has been a perpetual challenge. Unfortunately, the current technology-focused solutions have not helped IT to improve its dismal project success statistics. Data warehouses, BI implementations, and general analytical efforts achieve the same levels of success as other IT projects – approximately 1/3rd are considered successes when measured against price, schedule, or functionality objectives. The first step is determining the appropriate analysis approach to the data system integration challenge. The second step is understanding the strengths and weaknesses of various approaches. Turns out that proper analysis at this stage makes actual technology selection far more accurate. Only when these are accomplished can proper matching between problem and capabilities be achieved as the third step and true business value be delivered. This webinar will illustrate that good systems development more often depends on at least three data management disciplines in order to provide a solid foundation.
Find more Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
The data governance function exercises authority and control over the management of your mission critical assets and guides how all other data management functions are performed. When selling data governance to organizational management, it is useful to concentrate on the specifics that motivate the initiative. This means developing a specific vocabulary and set of narratives to facilitate understanding of your organizational business concepts. This webinar provides you with an understanding of what data governance functions are required and how they fit with other data management disciplines. Understanding these aspects is a necessary pre-requisite to eliminate the ambiguity that often surrounds initial discussions and implement effective data governance and stewardship programs that manage data in support of organizational strategy.
Check out more webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Data-Ed: Best Practices with the Data Management Maturity ModelData Blueprint
The Data Management Maturity (DMM) model is a framework for the evaluation and assessment of an organization's data management capabilities. The model allows an organization to evaluate its current state data management capabilities, discover gaps to remediate, and strengths to leverage. The assessment method reveals priorities, business needs, and a clear, rapid path for process improvements. This webinar will describe the DMM, its evolution, and illustrate its use as a roadmap guiding organizational data management improvements.
Data is the lifeblood of just about every organization and functional area today. As businesses struggle to come to grips with the data flood, it is even more critical to focus on data as an asset that directly supports business imperatives as other organizational assets do. Organizations across most industries attempt to address data opportunities (e.g. Big Data) and data challenges (e.g. data quality) to enhance business unit performance. Unfortunately however, the results of these efforts frequently fall far below expectations due to haphazard approaches. Overall, poor organizational data management capabilities are the root cause of many of these failures. This webinar covers three lessons (illustrated by examples), which will help you to establish realistic OM plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers.
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationDATAVERSITY
Need more than visualization?
Generate custom narrative docs from data today.
Technology for natural language generation (NLG) has advanced from the production of restricted-domain question-answering and simulation systems to the delivery of general purpose data- or model-driven narratives that are virtually indistinguishable from human-generated correspondence.
From sports to stock reports, you’ve probably read a machine-generated report in the past year without realizing that the “author” was a machine.
Participants in this webinar will learn how modern approaches have progressed beyond pattern matching and table-driven text selection to algorithms that consider context and tone. We will also present examples of commercially available NLG APIs to help participants experiment with NLG in their own applications right away.
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterDATAVERSITY
From its widespread formal business practice to the scope of casual popular awareness, “Big Data” has a tendency to live up to its name. Featured in countless headlines, journal articles, and industry reviews, Big Data metrics and methods such as NoSQL and Hadoop have taken up plenty of the spotlight as of late. However, most of what has been written about these topics is focused on the hardware, services, and scale-out involved with them, a misguided focus that ignores the critical questions driving any shift in corporate strategy: what can Big Data do for you? Which approach to it best fits your organization? And perhaps most importantly, what is required on your end in order to spur a successful implementation process?
In the interest of answering these and other questions, this webinar will:
Provide guidance on how to think about and establish realistic Big Data management plans and expectations for generating business value, as well as on the means by which big data can complement existing data management practices
Introduce a framework for evaluating the various choices when it comes to implementing and succeeding with Big Data/NoSQL
Elaborate upon the prototyping nature of practicing Big Data techniques
Show how to demonstrate a sample use ca
DataEd Slides: Data Management Best PracticesDATAVERSITY
It is clear that Data Management best practices exist and so does a useful process for improving existing Data Management practices. The question arises: Since we understand the goal, how does one design a process for Data Management goal achievement? This approach combines the DM BoK and the CMMI/DMM, permitting organizations with the opportunity to benefit from the best of both. The approach permits organizations to understand current Data Management practices, strengths to leverage, and remediation opportunities. In a nutshell, it describes what must be done at the programmatic level to achieve better data use.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
Every organization produces and consumes data. Because data is so important to day to day operations, data trends are hitting the mainstream and businesses are adopting buzzwords such as Big Data, NoSQL, data scientist, etc., to seek solutions for their fundamental issues. Few realize that the importance of any solution, regardless of platform or technology, relies on the data model supporting it. Data modeling is not an optional task for an organization’s data effort. It is a vital activity that supports the solutions driving your business.
This webinar will address fundamental data modeling methodologies, as well as trends around the practice of data modeling itself. We will discuss abstract models and entity frameworks, as well as the general shift from data modeling being segmented to becoming more integrated with business practices.
Learning Objectives:
How are anchor modeling, data vault, etc. different and when should I apply them?
Integrating data models to business models and the value this creates
Application development (Data first, code first, object first)
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong Value-Adding Proposition
by Patrick Hadley, Australian Bureau of Statistics at the Australian CIO Summit 2014
Sentara Linked Data Workshop - Sept 10, 20123 Round Stones
One day workshop to Sentara Healthcare on using a Linked Data approach for enterprise architecture. Topics include: Open Government Data initiatives, demo of Weather Health Web application; leveraging open data from NIH, NLM, NOAA, EPA, HHS; Callimachus Enterprise, a Linked Data Management System for the enterprise.
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Takeaways:
•Understanding foundational data quality concepts based on the DAMA DMBOK
•Utilizing data quality engineering in support of business strategy
•Case Studies illustrating data quality success
•Data Quality guiding principles & best practices
•Steps for improving data quality at your organization
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessDATAVERSITY
In many organizations and functional areas, data has pulled even with money in terms of what makes the proverbial world go ‘round. As businesses struggle to cope with the 21st century’s newfound data flood, it is more important than ever before to prioritize data as an asset that directly supports business imperatives. However, while organizations across most industries make some attempt to address data opportunities (e.g. Big Data) and data challenges (e.g. data quality), the results of these efforts frequently fall far below expectations. At the root of many of these failures is poor organizational data management—which fortunately is a remediable problem.
This webinar will cover three lessons, each illustrated with examples, that will help you establish realistic goals and benchmarks for data management processes and communicate their value to both internal and external decision makers:
- How organizational thinking must change to include value-added data management practices
- The importance of walking before you run with data-focused initiatives
- Prioritizing specification and data governance over “silver bullet” analytical tools
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
Many data professionals struggle with the ability to demonstrate tangible returns on data management investments. In a webinar that is designed to appeal to both business and IT attendees, your presenter Dr. Peter Aiken will describe multiple types of value produced through data-centric development and management practices. One of our examples, the healthcare space, offers the unique opportunity to demonstrate additional types of return on investment or value outcomes, namely returns in the form of lives saved through increased rates of Bone Marrow Donor matches. In addition to metrics around increasing revenues or decreasing costs, i.e. investments that directly impact an organization’s financial position, these additional statistics of lives saved can be used to justify data management and quality initiatives.
Check out more of our webinars here: http://www.datablueprint.com/resource-center/
Data architecture is foundational to an information-based operational environment. It is your data architecture that organizes your data assets so they can be leveraged in your business strategy to create real business value. Even though this is important, not all data architectures are used effectively. This webinar describes the use of data architecture as a basic analysis method. Various uses of data architecture to inform, clarify, understand, and resolve aspects of a variety of business problems will be demonstrated. As opposed to showing how to architect data, your presenter Dr. Peter Aiken will show how to use data architecting to solve business problems. The goal is for you to be able to envision a number of uses for data architectures that will raise the perceived utility of this analysis method in the eyes of the business.
Find more Data-Ed webinars here: www.datablueprint.com
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
Tools alone are not the answer: Career roles and growth tracks for data professionals. In today’s (Big) data-driven information economy, it is even more critical to focus on data as an asset that directly supports business imperatives. But tools alone are not the answer. Organizations that want to rise above their competition can only do so with the help of skilled professionals who know how to manage, mine, and draw actionable insights from the multitudes of (Big) data sources. Numerous new roles and job titles have emerged to address the high demand for specialized data professionals. This webinar brings together three individuals well qualified to contribute to this important industry-wide discussion of data jobs. We will take a closer look at these newer data management roles and present recommendations on how to enhance career paths.
Check out more webinars here: http://www.datablueprint.com/resource-center/webinar-archive/
Data is the lifeblood of just about every organization and functional area today. As businesses struggle to come to grips with the data flood, it is even more critical to focus on data as an asset that directly supports business imperatives as other organizational assets do. Organizations across most industries attempt to address data opportunities (e.g. Big Data) and data challenges (e.g. data quality) to enhance business unit performance. Unfortunately however, the results of these efforts frequently fall far below expectations due to haphazard approaches. Overall, poor organizational data management capabilities are the root cause of many of these failures. This webinar covers three lessons (illustrated by examples), which will help you to establish realistic OM plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers.
Check out more of our webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
We are in the middle of a data flood and we need to figure out how to tame it without drowning. Most of what has been written about Big Data is focused on selling hardware and services. But what about a Big Data Strategy that guides hardware and software decisions? While virtually every major organization is faced with the challenge of figuring out the approach for and the requirements of this new development, jumping into the fray hastily and unprepared will only reproduce the same dismal IT project results as previously experienced. Join Dr. Peter Aiken as he will debunk a number of misconceptions about Big Data as your un-typical IT project. He will provide guidance on how to establish realistic Big Data management plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers without getting lost in the hype.
Check out more of our Data-Ed webinars here: www.datablueprint.com/webinar-schedule
Data-Ed: Unlock Business Value through Document & Content ManagementData Blueprint
Organizations must realize what it means to utilize document and content management in support of business strategy. The volume of unstructured data is growing at an enormous pace. While we are still far away from automated content comprehension, increasingly sophisticated technologies are extending our business and data management capabilities into more critical and regulated areas. This presentation provides you with an understanding of the dimensions of these new developments, including electronic and physical document monitoring, storage systems, content analysis and archive, retrieve and purge cycling.
Learning Objectives:
What is Document & Content Management and why is it important?
Planning and Implementing Document & Content Management
Document/Record Management Lifecycle
Levels of Control
Content management building blocks
Guiding principles & best practices
Understanding foundational document & content management concepts based on the Data Management Body of Knowledge (DMBOK)
http://www.datablueprint.com/webinar-schedule
Data-Ed: Unlock Business Value Through Reference & MDM Data Blueprint
In order to succeed, organizations must realize what it means to utilize reference and MDM in support of business strategy. This presentation provides you with an Understanding of the goals of reference and MDM, including the establishment and implementation of authoritative data sources, more effective means of delivering data to various business processes, as well as increasing the quality of information used in organizational analytical functions, e.g. BI. We also highlight the equal importance of incorporating data quality engineering into all efforts related to reference and master data management.
Check out more of our webinars here: http://www.datablueprint.com/webinar-schedule
Data-Ed: Show Me the Money: Monetizing Data ManagementData Blueprint
Failure to successfully monetize data management investments sets up an unfortunate loop of fixing symptoms without addressing the underlying problems. As organizations begin to understand poor data management practices as the root causes of many of their business problems, they become more willing to make the required investments in our profession. This presentation uses specific examples to illustrate the costs of poor data management and how it impacts business objectives. Join us and learn how you can better align your data management projects with business objectives to justify funding and gain management approval.
Check out more of our webinars: http://www.datablueprint.com/resource-center/webinar-schedule/
Data Systems Integration & Business Value PT. 3: Warehousing Data Blueprint
Certain systems are more data focused than others. Usually their primary focus is on accomplishing integration of disparate data. In these cases, failure is most often attributable to the adoption of a single pillar (silver bullet). The three webinars in the Data Systems Integration and Business Value series are designed to illustrate that good systems development more often depends on at least three DM disciplines (pie wedges) in order to provide a solid foundation.
Integrating data across systems has been a perpetual challenge. Unfortunately, the current technology-focused solutions have not helped IT to improve its dismal project success statistics. Data warehouses, BI implementations, and general analytical efforts achieve the same levels of success as other IT projects – approximately 1/3rd are considered successes when measured against price, schedule, or functionality objectives. The first step is determining the appropriate analysis approach to the data system integration challenge. The second step is understanding the strengths and weaknesses of various approaches. Turns out that proper analysis at this stage makes actual technology selection far more accurate. Only when these are accomplished can proper matching between problem and capabilities be achieved as the third step and true business value be delivered.
Data Systems Integration & Business Value Pt. 2: CloudData Blueprint
Certain systems are more data focused than others. Usually their primary focus is on accomplishing integration of disparate data. In these cases, failure is most often attributable to the adoption of a single pillar (silver bullet). The three webinars in the Data Systems Integration and Business Value series are designed to illustrate that good systems development more often depends on at least three DM disciplines (pie wedges) in order to provide a solid foundation.
Many organizations are modifying their IT portfolios to fully take advantage of the benefits of cloud computing. While the motivation is specific and focuses on broad-based challenges, all organizations are prepared to benefit from aspects of the cloud. This is accomplished by ensuring that cloud-hosted data share three attributes. Cloud-hosted datasets must be of:
Higher quality data than those data residing outside of the cloud;
Lower volume (1/5 the size of data collections) than similar collections residing outside of the cloud; and
Increased share-ability than data residing outside the cloud.
Increases in capacity utilization, improved IT flexibility and responsiveness, as well as the forecast decreases in cost accruing to cloud-based computing are all possible after these first three conditions have been met. Necessary investments in data engineering can help organizations to save even more money by reducing the amount of resources required to perform their duties and increasing the effectiveness of their duties and decision-making. This webinar will show you how to recognize the opportunities, ‘size up’ the required investment, and properly supervise your efforts to take advantage of the opportunities presented by the cloud.
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData Blueprint
Certain systems are more data focused than others. Usually their primary focus is on accomplishing integration of disparate data. In these cases, failure is most often attributable to the adoption of a single pillar (silver bullet). The three webinars in the Data Systems Integration and Business Value series are designed to illustrate that good systems development more often depends on at least three DM disciplines (pie wedges) in order to provide a solid foundation.
Much of the discussion of metadata focuses on understanding it and the associated technologies. While these are important, they represent a typical tool/technology focus and this has not achieved significant results to date. A more relevant question when considering pockets of metadata is: Whether to include them in the scope organizational metadata practices. By understanding what it means to include items in the scope of your metadata practices, you can begin to build systems that allow you to practice sophisticated ways to advance their data management and supported business initiatives. After a bit of practice in this manner you can position your organization to better exploit any and all metadata technologies.
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar focuses on obtaining business value from data quality initiatives. I will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. This in turn allows organizations to more quickly identify business problems as well as data problems caused by structural issues versus practice-oriented defects and prevent these from re-occurring.
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Yes, we face a data deluge and big data seems to be largely about how to deal with it. But 99% of what has been written about big data is focused on selling hardware and services. The truth is that until the concept of big data can be objectively defined, any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion. While both the need for and approaches to these new requirements are faced by virtually every organization, jumping into the fray ill-prepared has (to date) reproduced the same dismal IT project results.
The very real, very rapid, very great increases in data of all forms (charts showing data types and volume increases)
Challenges faced by virtually all data management programs
Means by which big data techniques can compliment existing data management practices
Necessary but insufficient pre-requisites to exploiting big data techniques
Prototyping nature of practicing big data techniques
You can sign up for future Data-Ed webinars here: http://www.datablueprint.com/resource-center/webinar-schedule/
Data-Ed: Unlock Business Value through Data GovernanceData Blueprint
If your organization understands your function, they see you as an investment. If your organization does not understand what you do, they are likely to perceive you as a cost. The goal of this webinar is to provide you with concrete ideas for how to reinforce the first mindset at your organization. Success stories must be used to ensure continued organizational support. When selling data governance to organizational management, it is useful to concentrate on the specifics that motivate the initiative. This means developing a specific vocabulary and set of narratives to facilitate understanding of your organizational business concepts. For example: using specific common terms (and narratives) when referencing organizational mishaps, e.g. The Chocolate Story.
Learning Objectives:
Understanding contextually why data governance can be tricky for most organizations
Demonstrate a variety of “storytelling” techniques
How to use “worst practices” to your advantage
Understanding foundational data governance concepts based on the Data Management Body of Knowledge (DMBOK)
Taking away several novel but tangible examples of generating business value through data governance
Leading the Data Asset Management Team: CDO or Top Data Job?Data Blueprint
Join Peter Aiken, Ph.D. and Micheline Casey for this interactive discussion on the role of Chief Data Officer (CDO) or Top Data Job (TDJ). While most agree that data challenges are getting – dare we say it, bigger? – the range of approaches reveals no emerging consensus as to the best way to address these challenges. This webinar features a wide-ranging discussion of a number of aspects of this exciting new career path. For each of these aspects, new data leaders can be congratulated but sometimes they also ought to be consoled. Ms. Casey (as the very first state CDO) and Dr. Aiken will bring certain considerations to the table. They hope to sample the pulse of the community and move towards consensus on a number of issues, including:
What is in a name/title?
Who are this individual’s peers?
Where does one obtain the requisite background to qualify?
How does RACI (a responsibility assignment matrix) apply?
When does data influence IT development efforts?
Why are these issues not better understood?
Data-Ed: Building the Case for the Top Data JobData Blueprint
Reflections on the past 25 years of organizational IT accomplishments, combined with performance measurement data, indicate that current IT management has been called upon to do a job that it cannot do well. Data are assets that deserve to be managed as professionally and aggressively as other company assets. Objective measurements show that approximately 1% of all organizations achieve data management success. In the face of the ongoing “data explosion,” this leaves most organizations wholly unprepared to leverage their sole, non-degrading, strategic asset. The requirements and organizational performance dictate a full time position that does not report to IT and manages the data function from a function that is external to and precedes the SDLC. While transformation may require some organizational discomfort, this move will achieve improved organizational IT performance faster and cheaper than ERPs or any other silver bullet.
Learning Objectives:
Why there typically isn’t and ultimately must be an authority (a chief) on organizational informational asset management
Why CIOS have not been able to devote the required time and attention
The seriousness of the skill gap – requisite expertise is rare
Understanding the ideal relationship between Data and IT.
Data-Ed: Unlocking business value through data modeling and data architecture...Data Blueprint
When asked why they are architecting data, many in the practice answer: "Because that is what must be done." However, a better approach to this question is to speak in terms that are understood in the executive suite – business results! All of our organizations are faced with various organizational challenges that require analysis. Building new systems is just one example. This webinar describes the use of data architecting as a basic analysis method (one of many that good analysts should keep in their “toolbox"). I will demonstrate various uses of data architecting to inform, clarify, understand, and resolve aspects of a variety of business problems. As opposed to showing how to architect data, I will show how to use data architecting to solve business problems. The goal is for you to be able to envision a number of uses for data architectures that will raise the perceived utility of this analysis method in the eyes of the business.
Learning Objectives:
Understanding how to contribute to organizational challenges beyond traditional data architecting
Realizing the fundamental difference between "definition" and "purpose"
Guiding analyses through data analysis
Using data modeling in conjunction with architecture/engineering techniques
Understanding foundational data architecture concepts based on the Data Management Body of Knowledge (DMBOK)
How to utilize data architecting in support of business strategy
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
1. • Big Data
could
know us
better than
we know
ourselves
– Dan
Gardner
• We'll see this as the
time in history wh
the world's
information was
transformed from
inert, passive stat
and put into a
unified system th
brings that
information alive
– Michael Nielsen
ow have a
ce to en
me the
of our
nowledge
rse, one an
onstantly e,
figures
to match
eeds
hael S.
one
at
A Framework for Implementing
NoSQL, Hadoop
• N • Today a street stall in Mumbai can access more
b information, maps, statistics, academic papers, price
n trends, futures markets, and data than a U.S.
c President could only a few decades ago
– – Juan Enriquez
ot everything that can
e counted counts, and
ot everything that
ounts can be counted
Albert Einstein
Big Data and NoSQL continue to make headlines everywhere.
However, most of what has been written about these topics is
focused on the hardware, services, and scale out. But what about
a Big Data and NoSQL Strategy, one that supports your business
strategy? Virtually every major organization thinking about these
data platforms is faced with the challenge of figuring out the
appropriate approach and the requirements. This presentation will
provide guidance on how to think about and establish realistic Big
Data management plans and expectations. We will introduce a
framework for evaluating the various choices when it comes to
implementing and succeeding with Big Data/NoSQL and show
how to demonstrate a sample use case.
Takeaways:
• A Framework for evaluating Big Data techniques
• Deciding on a Big Data platform – How do you know which one
is a good fit for you?
• The means by which big data techniques can complement
existing data management practices
• The prototyping nature of practicing big data techniques
• The distinct ways in which utilizing Big Data can generate
business value
Date:
Time:
Presenter:
June 9, 2015
2:00 PM ET/11:00AM PT
PeterAiken, Ph.D. & Josh Bartels
• Soon we will salt the oceans, the land, and the sk
with uncounted numbers of sensors invisible to th
eyes but visible to one another
• We n – Esther Dyson
chan
beco
center
own k
unive
that c
recon
itself
our n
– Mic
Mal
• We've reached a tipping point in history: today more y
data is being manufactured by machines, servers, e
and cell phones, than by people
– Michael E. Driscoll
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
swept away the old world with a vision of a new one.
Today, we seem to be entering the era of Big Data
– Michael Coren
1Copyright 2015 by Data Blueprint Slide #
3. Steven MacLauchlan
• 10 years of experience in Application
Development and Data Modeling with a
focus on Healthcare solutions.
• Delivers tailored data management
solutions that provide focus on data’s
business value while enhancing clients’
overall capability to manage data
• Certified Data Management Professional (CDMP)
• Computer Science degree from Virginia Commonwealth
University
• Most recent focus: Understanding emerging
data modeling trends and how these can
best be leveraged for the Enterprise.
3Copyright 2015 by Data Blueprint Slide #
4. Get Social With Us!
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
Ask questions and submit
your comments: #dataed
Like Us on Facebook
www.facebook.com/
datablueprint
Post questions and comments
Find industry news, insightful
content
and event updates.
Join the Group
Data Management &
Business Intelligence
Ask questions, gain insights
and collaborate with fellow
data management
professionals
4Copyright 2015 by Data Blueprint Slide #
5. Peter Aiken, Ph.D.
• 30+ years in data management
• Repeated international recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS (vcu.edu)
• DAMA International (dama.org)
• 9 books and dozens of articles
• Experienced w/ 500+ data
management practices
• Multi-year immersions:
– US DoD
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– …
• DAMA International President 2009-2013
• DAMA International Achievement Award 2001 (with
Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
PETERAIKEN WITH JUANITABILLINGS
F OR EW O RD B Y J O H N B OTTEGA
MONETIZING
DATA M AN AGEM EN T
Unlocking the Value in Your Organization’s
Most Important Asset.
TheCaseforthe
Chief ta fficer
Recasting uite erage
Your Most aluable A
Peter Aikenand
Michael Gorman
5Copyright 2015 by Data Blueprint Slide #
6. Josh Bartels
• Data management consultant and
leader
– Over (10) years of experience
– Multiple industries (Finance, Defense,
Insurance)
• Certifications
– Certified Data Management
Professional (CDMP)
– Project Manager (PMP)
– Data Vault 2.0 Practitioner (CDVP2)
• Education
– Masters in Business Administration
– Masters in Information Systems
• Current Efforts
– focus on the creation and migration to
new data platforms for clients in the
financial and insurance industries.
6Copyright 2015 by Data Blueprint Slide #
7. Presented by Peter Aiken, Ph.D., Josh Bartels, Steven MacLauchlan
A Framework for Implementing
NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right
Approach for Implementing Big Data Techniques
7Copyright 2015 by Data Blueprint Slide #
8. A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
8Copyright 2015 by Data Blueprint Slide #
9. A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
10Copyright 2015 by Data Blueprint Slide #
10. Myth #1: Big Data has a clear definition
Fact:
• The term is used so often
and in many contexts that
its meaning has become
vague and ambiguous
• Industry experts and
scientists often disagree
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
10Copyright 2015 by Data Blueprint Slide #
11. Big Data(has something to do with Vs - doesn't it?)
• Volume
– Amount of data
• Velocity
– Speed of data in and out
• Variety
– Range of data types and sources
• 2001 Doug Laney
• Variability
– Many options or variable interpretations confound analysis
• 2011 ISRC
•Vitality
–A dynamically changing Big Data environment in which analysis and predictive models
must continually be updated as changes occur to seize opportunities as they arrive
• 2011 CIA
•Virtual
– Scoping the discussion to only include online assets
• 2012 Courtney Lambert
• Value/Veracity
• Stuart Madnick (John Norris Maguire Professor of Information Technology, MIT Sloan School of
Management & Professor of Engineering Systems, MIT School of Engineering)
11Copyright 2015 by Data Blueprint Slide #
12. Defining Big Data
• Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery
and process optimization.
– Gartner 2012
• Big data refers to datasets whose size is beyond the ability of
typical database software tools to capture, store, manage, and analyze.
– IBM 2012
• An all-encompassing term for any collection of data sets so large and complex that it
becomes difficult to process using on-hand data management tools or traditional data
processing applications
– Wikipedia 2014
• Shorthand for advancing trends in technology that open the door to a new approach
to understanding the world and making decisions.
– NY Times 2012
• The broad range of new and massive data types that have appeared over the last
decade
– Tom Davenport 2014
• Data of a very large size, typically to the extent that its manipulation and management
present significant logistical challenges.”
– Oxford English Dictionary 2014
• Big data is about putting the "I" back into IT.
– PeterAiken 2007
12Copyright 2015 by Data Blueprint Slide #
13. Big Data Techniques
• New techniques available to impact the productivity (order of
magnitude) of any analytical insight cycle that compliment,
enhance, or replace conventional (existing) analysis methods
• Big data techniques are currently characterized by:
– Continuous, instantaneously
available data sources
– Non-von Neumann
Processing (defined later in the presentation)
– Capabilities approaching
or past human comprehension
– Architecturally enhanceable
identity/security capabilities
– Other tradeoff-focused data processing
• So a good question becomes "where in our existing architecture
can we most effectively apply Big Data Techniques?"
13Copyright 2015 by Data Blueprint Slide #
14. Big Data Technologies by themselves, are a One Legged Stool
Governance is the major means
of preventing over reliance on
one legged stools!
14Copyright 2015 by Data Blueprint Slide #
15. The Big Data Landscape
Copyright Dave Feinleib, bigdatalandscape.com
15Copyright 2015 by Data Blueprint Slide #
18. Myth #2: Everyone should invest in Big Data
Fact:
• Not every company will
benefit from Big Data
• It depends on your size
and your ability
– Local pizza shop vs.
state-wide or national
chain
18Copyright 2015 by Data Blueprint Slide #
19. Big Data can create significant financial value across sectors
• Some (not all)
companies can
take advantage
of Big Data to
create value if
they want to
compete
20Copyright 2015 by Data Blueprint Slide #
20. A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
20Copyright 2015 by Data Blueprint Slide #
21. Big Data = Big Spending
• Enterprises are spending wildly on Big Data but don’t
know if it’s worth it yet (Business Insider, 2012)
• Big Data Technology Spending Trend:
• 83% increase over the next 3 years (worldwide):
– 2012: $28 billion
– 2013: $34 billion
– 2016: $232 billion
• Caution:
– Don’t fall victim to SOS (Shiny Object
Syndrome)
– A lot of money is being invested but
is it generating the expected return?
– Gartner Hype Cycle suggests results
are going to be disappointing http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe
http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html
http://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl
21Copyright 2015 by Data Blueprint Slide #
22. Who wrote this … ?
23
Copyright 2015 by Data Blueprint
• In considering any new
subject, there is
frequently a tendency
first to overrate what
we find to be already
interesting or
remarkable, and
secondly - by a sort of
natural reaction - to
undervalue the true
state of the case.
• AugustaAda King,
Countess of Lovelace - aka
Ada Lovelace, publisher of
the first computing program
23. Gartner Five-phase Hype Cycle
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
Peak of Inflated Expectations: Early publicity produces a number of
success stories—often accompanied by scores of failures. Some
companies take action; many do not.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the
technology shake out or fail. Investments continue only if the surviving providers improve their products to the
satisfaction of early adopters.
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest
trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Slope of Enlightenment: More instances of how the technology can benefit the
enterprise start to crystallize and become more widely understood. Second- and third-
generation products appear from technology providers. More enterprises fund pilots;
conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to
take off. Criteria for assessing provider viability are more
clearly defined. The technology’s broad market
applicability and relevance are clearly paying off.
23Copyright 2015 by Data Blueprint Slide #
24. Gartner Hype Cycle
"A focus on big data is not a substitute for the
fundamentals of information management."
24Copyright 2015 by Data Blueprint Slide #
25. 2012 Big Data in Gartner’s Hype Cycle
25Copyright 2015 by Data Blueprint Slide #
26. 2013 Big Data in Gartner’s Hype Cycle
26Copyright 2015 by Data Blueprint Slide #
27. 2014 Big Data in Gartner’s Hype Cycle
27Copyright 2015 by Data Blueprint Slide #
28. Big Data Gartner Hype Cycle
Copyright 2015 by Data Blueprint Slide #
29
29. Myth #3: Big Data is innovative
Fact:
• Big Data techniques are
innovative
• ROI and insights depend
on the size of the business
and the amount of data
used and produced, e.g.
– Local pizza place vs. Papa
John’s
– Retail
29Copyright 2015 by Data Blueprint Slide #
30. My Barn must pass a foundation inspection
• Before further construction can proceed
• No IT equivalent in most organizations
30Copyright 2015 by Data Blueprint Slide #
31. Frameworks
• A system of ideas
for guiding
analyses
• A means of
organizing project
data
• Data integration
priorities decision
making
framework
• A means of
assessing
progress
8 31Copyright 2015 by Data Blueprint Slide #
32. "There’s now a blurring between the storage world and the memory world"
• Faster processors outstripped
not only the hard disk, but main
memory
– Hard disk too slow
– Memory too small
• Flash drives remove both
bottlenecks
– Combined Apple and Yahoo have
spend more than $500 million to
date
• Make it look like traditional
storage or more system
memory
– Minimum 10x improvements
– Dragonstone server is 3.2 tb flash
memory (Facebook)
• Bottom line - new capabilities!
8 32Copyright 2015 by Data Blueprint Slide #
33. Non-von Neumann Processing/Efficiencies
• von Neumann
bottleneck
(computer science)
– "An inefficiency inherent in
the design of any von
Neumann machine that
arises from the fact that
most computer time is
spent in moving
information between
storage and the central
processing unit rather than
operating on it"
[http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck]
• Michael Stonebraker
– Ingres (Berkeley/MIT)
– Modern database
processing is
approximately 4%
efficient
• Many big data
architectures are
attempts to address
this, but:
– Zero sum game
– Trade characteristics
against each other
• Reliability
• Predictability
– Google/MapReduce/
Bigtable
– Amazon/Dynamo
– Netflix/Chaos Monkey
– Hadoop
– McDipper
• Big data techniques
exploit non-von
Neumann processing
8 33Copyright 2015 by Data Blueprint Slide #
35. One of Data Blueprint's Big Data Clusters
8 35Copyright 2015 by Data Blueprint Slide #
36. <-Feedback
Exploitable
Insight
• Patterns/objects,
hypotheses emerge
– What can be observed?
• Operationalizing
– The dots can be
repeatedly connected
Analytics Insight Cycle
Exis&ng
Knowledge
/base
• Things are happening
– Sensemaking
techniques address
"what" is happening?
• Patterns/objects,
hypotheses emerge
– What can be observed?
• Operationalizing
– The dots can be
repeatedly connected
– "Big Data" contributions
are shown in orange
• Margaret Boden's
computational
creativity
– Exploratory
– Combinational
– Transformational
Volume
Variety
Velocity
Potential/
actual
insights
Pattern/Object
Emergence
Analytical
bottleneck
8 36Copyright 2015 by Data Blueprint Slide #
37. Big Data: Two prominent use cases
• Sandwich offers a good analogy
of the big data and existing
technologies
• Landing Zone (less expensive)
– Especially useful in cases were data
is highly disposable
• Existing technologies are the
– Contents sandwiched and
complemented landing zone and
archival capabilities
• Archiving/Offloading (less need
for structure)
– "Cold" transactional and analytic
data
Adapted from Nancy Kopp:
http://ibmdatamag.com/2013/08/relishing-the-big-data-burger/
Landing Zone
Archiving Offloading
Existing
Data Architectural
Processing
8 37Copyright 2015 by Data Blueprint Slide #
38. What is NoSQL?
• Commonly interpreted as "Not Only SQL
• Broad class of database management technologies that
provide a mechanism for storage and retrieval of data that
doesn’t follow traditional relational database methodology.
• Motivations
– Simplicity of design
– Horizontal scaling
– Finer control over availability of the data.
• The data structures used by NoSQL databases differ from
those used in relational databases, making some
operations faster in NoSQL
and others faster in relational
databases.
8 38Copyright 2015 by Data Blueprint Slide #
39. What is Hadoop?
• A data storage and processing
system, that runs on clusters of commodity servers.
• Able to store any kind of data in its native format.
• Perform a wide variety of analyses and transformations.
• Store terabytes, and even petabytes, of data
inexpensively.
• Handles hardware and system failures automatically,
without losing data or interrupting data analyses.
• Critical components of Hadoop:
– HDFS- The Hadoop Distributed File System is the storage system
for a Hadoop cluster, responsible for distribution of data across the
servers.
– Mapreduce- The inner workings of Hadoop that allows for distributed
and parallel analytical job execution.
40Copyright 2015 by Data Blueprint Slide #
40. Why NoSQL? Why Hadoop?
• Large number of users (read: the internet)
• Rapid app development and deployment
• Large number of mission critical writes (sensors/etc)
• Small, continuous reads and writes, especially where
“Consistency” is less important (social networks)
• Hadoop solves the hard scaling problems caused by large
amounts of complex data.
• As the amount of data in a cluster grows,
new servers can be added to a Hadoop
cluster incrementally and inexpensively
to store and analyze it.
40Copyright 2015 by Data Blueprint Slide #
41. Hadoop Use Cases in the Real World
• Risk Modeling
• Customer Churn Analysis
• Recommendation Engine
• Ad Targeting
• Point of Sale Transaction Analysis
• Social Sentiment on Social Media
• Analyzing network data to predict failure
• Threat analysis
• Trade Surveillance
41Copyright 2015 by Data Blueprint Slide #
43. 44
Copyright 2015 by Data Blueprint
• Data analysis struggles with the social
– Your brain is excellent at social cognition - people can
• Mirror each other’s emotional states
• Detect uncooperative behavior
• Assign value to things through emotion
– Data analysis measures the quantity of social
interactions but not the quality
• Map interactions with co-workers you see during work days
• Can't capture devotion to childhood friends seen annually
– When making (personal) decisions about social
relationships, it’s foolish to swap the amazing machine
in your skull for the crude machine on your desk
• Data struggles with context
– Decisions are embedded in sequences and contexts
– Brains think in stories - weaving together multiple
causes and multiple contexts
– Data analysis is pretty bad at
• Narratives / Emergent thinking / Explaining
• Data creates bigger haystacks
– More data leads to more statistically significant
correlations
– Most are spurious and deceive us
– Falsity grows exponentially greater amounts of data
we collect
• Big data has trouble with big problems
– For example: the economic stimulus debate
– No one has been persuaded by data to switch sides
• Data favors memes over masterpieces
– Detect when large numbers of people take an instant
liking to some cultural product
– Products are hated initially because they are unfamiliar
• Data obscures values
– Data is never raw; it’s always structured according to
somebody’s predispositions and values
Some Big Data Limitations
44. Myth #4: Big Data is just another IT project
Copyright 2013 by Data Blueprint
Fact:
• Big Data is not your typical IT
project
– Does not answer typical IT questions
– Trend analysis, agile, actionable, etc.
– Fundamentally different approach
• Big Data Projects are exploratory
• Big Data enables new capabilities
• Big Data can be a disruptive
technology
• It might sound simple but that
doesn’t mean it’s easy
• Beware of SOS (Shiny Object
Syndrome)
44
48. ("Whereas of the Plague")
Plague Peak
When is it happening?
Copyright 2015 by Data Blueprint
48
49. Black Rats or Rattus Rattus
Why is it happening?
50
Copyright 2015 by Data Blueprint
50. What Will Happen? What will happen?
51
Copyright 2015 by Data Blueprint
51. Formalizing Data Management
• Defend the Realm:
The authorized history of MI5
by Christopher Andrew
• World War I
• 1914
• At war with much
of Europe
• 14,000,000 Germans living
in the United Kingdom
• How to efficiently and
effectively manage
information on that many
individuals?
• The Security Service is responsible for "protecting
the UK against threats to national security from
espionage, terrorism and sabotage, from the activities
of agents of foreign powers, and from actions intended
to overthrow or undermine parliamentary democracy by
political, industrial or violent means."
51Copyright 2015 by Data Blueprint Slide #
52. “As a final thought, how about a machine that
would send, via closed-circuit television, visual and
oral information needed immediately at high-level
conferences or briefings? Let’s say that a group of
senior officers are contemplating a covert action
program for Afghanistan. Things go well until
someone asks “Well, just how many schools are
there in the country, and what is the literacy rate?”
No one in the room knows. (Remember, this is an
imaginary situation). So the junior member present
dials a code number into a device at one end of the
table. Thirty seconds later, on the screen overhead,
a teletype printer begins to hammer out the
required data. Before the meeting is over, the group
has been given, through the same method, the
names of countries that have airlines into
Afghanistan, a biographical profile of the Soviet
ambassador there, and the Pakistani order of battle
along the Afghanistan frontier. Neat, no?”
• Predicted use of
not just
computing in the
intelligence
community
• Also forecast
predictive
analytics
• Accompanying
privacy
challenges
52Copyright 2015 by Data Blueprint Slide #
53. A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
53Copyright 2015 by Data Blueprint Slide #
54. http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
Copyright 2013 by Data Blueprint
Myth #6: Big Data provides all the Answers
Fact:
• Big Data does not mean the end of
scientific theory
• Be careful or you’ll end up with
spurious correlations
– Don’t just go fishing for correlations and
hope they will explain the world
• To get to the WHY of things, you
need ideas, hypotheses and theories
• Having more data does not
substitute for thinking hard,
recognizing anomalies and exploring
deep truths
• You need the right approach
54
56. • Identify business opportunity
Copyright 2013 by Data Blueprint
• How can data be leveraged in
exploring
– External market place
• Analyze opportunities and threats
– Internal efficiencies
• Analyze strengths and weaknesses
56
57. Example: 2012 Olympic Summer Games
Copyright 2013 by Data Blueprint
1. Volume: 845 million FB users averaging 15 TB
+ of data/day
2. Velocity: 60 GB of data per second
3. Variety: 8.5 billion devices connected
4. Variability: Sponsor data, athlete data, etc.
5. Vitality: Data Art project “Emoto”
6. Virtual: Social media
57
58. • Based on my 6 V analysis, do I need a Big Data solution
Copyright 2013 by Data Blueprint
or does my current BI solution address my business
opportunity?
– Do the 6 Vs indicate general Big Data characteristics?
– What are the limitations of my current Bi environment?
(Technology constraint)
– What are my budgetary restrictions? (Financial constraint)
– What is my current Big Data knowledge base? (Knowledge
constraint)
58
59. • MUST have both
Foundational and
Technical practice
expertise
60
Copyright 2013 by Data Blueprint
61. • Data Strategy
Copyright 2013 by Data Blueprint
• Data Governance
• Data Architecture
• Data Education
61
62. • Data Quality
Copyright 2013 by Data Blueprint
• Data Integration
• Data Platforms
• BI/Analytics
62
63. • Needs to be actionable
• Generally well understood by
business
• Document what has been learned
Copyright 2013 by Data Blueprint
63
64. • Perfect results are not
necessary
• Reiterate and refine
• Iterative process to
reach decision point
• Use as feedback for
next exploration
Copyright 2013 by Data Blueprint
64
66. Myth #7: You need Big Data for Insights
Fact:
• Distinction between Big Data and
doing analytics
– Big Data is defined by the technology stack
that you use
– Big Data is used for predictive and
prescriptive analytics
• Use existing data for reporting, figure
out bottlenecks and optimize current
business model
• Understand how is your data
structured, architected and stored
Copyright 2013 by Data Blueprint
66
67. A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
68Copyright 2015 by Data Blueprint Slide #
Tweeting now at: #dataed
68. Social Sentiment Analysis
• One of the burgeoning areas
for use of Big Data / Hadoop
platforms.
• Allows for the landing of
multiple sources of
unstructured data. (Twitter,
Facebook, Linked In, etc.)
• Data than can be analyzed
with algorithms looking for
keywords that determine
positive/negative feedback
Copyright 2013 by Data Blueprint
69
69. Operational Use
• Utilize real time pricing data from multiple sources to dynamically
update the pricing for books in the Amazon Marketplace.
• Ingested data from multiple sources looking for real time changes
in price.
• Would apply predictive model to determine best price point and set
price of their books on the marketplace.
• Increased conversion rate, but created a race to the bottom
situation if not monitored
Copyright 2013 by Data Blueprint
79
70. Healthcare Example: Patient Data
Copyright 2013 by Data Blueprint
• Clinical data:
– Diagnosis/prognosis/treatment
– Genetic data
• Patient demographic data
• Insurance data:
– Insurance provider
– Claims data
• Prescriptions & pharmacy information
• Physical fitness data
– Activity tracking through
smartphone apps & social media
• Health history
• Medical research data
70
71. http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/
Copyright 2013 by Data Blueprint
Retail Example: Loyalty Programs & Big Data
• Companies need to understand current wants and needs AND
predict future tendencies
• Customer -> Repeat Customer -> Brand Advocate
• Customer loyalty programs & retention strategies
– Track what is being purchased and how often
– Coupons based on purchasing history
– Targeted communications, campaigns & special offers
– Social media for additional interactions
– Personalize consumer interactions
• Customer purchase history influences
product placements
– Retailers rapidly respond to consumer demands
– Product placements, planogram optimization, etc.
71
72. References
Copyright 2013 by Data Blueprint
• The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November
20, 2012)
• McKinsey: Big Data: The next frontier for innovation, competition and productivity
(http://www.mckinsey.com/insights/business_technology/
big_data_the_next_frontier_for_innovation?p=1)
• The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/
2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics)
• Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving
Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/
2575515)
• The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/
2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&)
• CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/
429681/five_steps_how_better_manage_your_data/)
• Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’But Don’t Know If It’s
Worth It Yet (http://www.businessinsider.com/enterprise-big-data-
spending-2012-11#ixzz2cdT8shhe)
• Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/
kathleen-kim/big-data-spending-to-increase-for-it-industry.html)
• Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/
2013/09/27/big-data-boosts-customer-loyalty-no-really/)
72
73. Data Management Maturity
July 14, 2015 @ 2:00 PM ET/11:00 AM PT
Trends in Data Modeling
August 11, 2015 @ 2:00 PM ET/11:00 AM PT
Sign up here:
www.datablueprint.com/webinar-schedule
or www.dataversity.net
Upcoming Events
Copyright 2013 by Data Blueprint
73
74. 10124 W. Broad Street, Suite C
GlenAllen, Virginia 23060
804.521.4056
75. Copyright 2013 by Data Blueprint
77
Potential Tradeoffs:
CAP theorem: consistency, availability and partition-tolerance
Small datasets can be both consistent & available
Partition
(Fault)
Tolerance
AvailabilityConsistency
Atomicity
Consistency
Isolation
Durability
Basic
Availability
Soft-state
Eventual consistency
77. http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1
Copyright 2013 by Data Blueprint
5 Ways in which Data creates Business Value
1. Information is transparent
and usable at much higher
frequency
2. Expose variability and
boost performance
3. Narrow segmentation of
customers and more
precisely tailored products
or services
4. Sophisticated analytics and
improved decision-making
5. Improved development of
the next generation of
products and services
77
78. • We are at an inflection point: The
sheer volume of data generated,
stored, and mined for insights has
become economically relevant to
businesses, government, and
consumers (McKinsey)
• We believe the same important
principles still apply:
– What problem are you trying to solve for
your business? Your solution needs to fit
your problem
– Doing data for (big) data’s sake is not going
to solve any problems
– Risk of spending a lot of money on chasing
Big Data that will realize little to no returns -
especially at this hype cycle stage
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1
Why the Big Deal about Big Data?
80
Copyright 2013 by Data Blueprint
80. Take Aways-Big Data Context
Copyright 2013 by Data Blueprint
• Technology continues to evolve at
increasing speeds
• Big Data is here
– We have the potential to
create insights
• Spend wisely & strategically:
– Big Data is not going to solve
all your problems.
• Fact:
– Big Data is not for everyone
• Fact:
– Lack of a clear definition
• Hype Cycle:
– Current: Peak of Inflated Expectations
– Soon: Trough of Disillusionment
80
81. Take Aways: Big Data Challenges Today
Copyright 2013 by Data Blueprint
• Fact: Big Data techniques are innovative but
“Big Data” is not
• Challenges are both foundational and
technical, today as well as in 1600s
• Technology continues to advance rapidly (4
Vs)
• Challenges associated with Big Data are not
new:
– Well-known foundational data management issues
– Need to align data and business with rapidly
changing environment
– Duplicity, accessibility, availability
– Foundational business issues
81
82. Take Aways-Approach: Crawl, Walk, Run
Copyright 2013 by Data Blueprint
• Crawl:
– Identify business opportunity and
determine whether you truly need
a Big Data solution
• Walk:
– Apply a combination of
foundational and technical data
management practices.
Document your insights and
make sure they are actionable
• Run:
– Recycle and explore. Staying
agile allows you to be exploratory.
82
83. Take Aways-Design Principles: Foundational & Technical
Copyright 2013 by Data Blueprint
• Foundational data management
principles still apply
• Beware of SOS (Shiny Object
Syndrome)
• You must have a data strategy before
you can have a Big Data strategy
• Fact: You don’t need Big Data to gain
insights
• Big Data integration requirements evolve
from your strategy
• Fact: Bigger Data is not always better
83
84. Take Aways: In Summary
Copyright 2013 by Data Blueprint
• Big data techniques are innovative
but “Big Data” is not
• Big Data characteristics: 6 Vs
– Volume, Velocity, Variety, Variability, Vitality,
Virtual
• Approach: Crawl-Walk-Run
• Big Data challenges require solutions
that are based on foundational and
technical data management practices
• Beware of SOS (Shiny Object
Syndrome):
– Spend wisely and strategically
– Big Data is not going to solve all your
problems
84
85. Foundational Practice: Data Strategy
• Your data strategy must
align to your organizational
business strategy and
operating model
• As the market place
becomes more data-
driven, a data-focused
business strategy is an
imperative
• Must have data strategy
before you have a Big
Data strategy
Copyright 2013 by Data Blueprint
85
86. Data Strategy Considerations
• What are the questions that
you cannot answer today?
• Is there a direct reliance on
understanding customer
behavior to drive revenue?
• Do you have information
overload and are you trying to
find the signal in the noise?
• Which is more important:
– Establishing value from current
data assets/data reporting?
– Exploring Big Data
opportunities?
Copyright 2013 by Data Blueprint
86
87. Foundational Practice: Data Architecture
• Common vocabulary expressing
integrated requirements ensuring
that data assets are stored,
arranged, managed, and used in
systems in support of
organizational strategy [Aiken
2010]
• Most organizations have data
assets that are not supportive of
strategies
• Big question:
– How can organizations more
effectively use their information
architectures to support
strategy implementation?
90
Copyright 2013 by Data Blueprint
88. Data Architecture Considerations
• Does your current architecture for
BI and analytics support Big Data?
• Are you getting enough value out of
your current architecture?
• Can you easily integrate and share
information across your
organization?
• Do you struggle to extract the value
from your data because it is too
cumbersome to navigate and
access?
• Are you confident your data is
organized to meet the needs of
your business?
Copyright 2013 by Data Blueprint
88
89. Technical Practice: Data Integration
• A data-centric
organization requires
unified data
• Integrating data across
organizational silos
creates new insights
• It is also the biggest
challenge
• Big Data techniques can
be used to complement
existing integration efforts
Copyright 2013 by Data Blueprint
89
90. Data Integration Considerations
• The complexity of your data
integration challenge depends on
the questions you’re trying to
answer
• Integration requirements for Big
Data are dependent on the types of
questions you’re asking:
– Integration here may be more fuzzy than
discrete
– Integration is domain-based (based on
time, customer concept, geographic
distribution)
• Those requirements should evolve
from your strategy
Copyright 2013 by Data Blueprint
90
91. Technical Practice: Data Quality
• Quality is driven by fit for purpose
considerations
• Big Data quality is different:
– Basic
– Availability
– Soft-state
– Eventual consistency
• Directional accuracy is the goal
• Focus on your most important data
assets and ensure our solutions
address the root cause of any quality
issues – so that your data is correct
when it is first created
• Experience has shown that
organizations can never get in front of
their data quality issues if they only use
the ‘find-and-fix’ approach
Copyright 2013 by Data Blueprint
91
92. Data Quality Considerations
• Big Data is trying to be
predictive
• What are the questions you
are trying to answer?
– What level of accuracy are you
looking for?
– What confidence levels?
– Example: Do I need to know
exactly what the customer is
going to buy or do I just need to
know the range of products he/
she is going to choose from?
Copyright 2013 by Data Blueprint
92
93. Technical Practice: Data Platforms
• Do you want to measure
critical operational process
performance?
• No one data platform can
answer all your questions. This
is commonly misunderstood
and often leads to very
expensive, bloated and
ineffective data platforms.
• Understanding the questions
that need to be asked and how
to build the right data platform
or how to optimize an existing
one
Copyright 2013 by Data Blueprint
93
94. Data Platforms Considerations
• Commonalities between most big data
stacks with file storage, columnar store,
querying engine, etc.
• Big data stack generally looks the same
until you get into appliances
– Algorithms are built into appliance
themselves, e.g. Netezza, Teradata,
etc.)
• Ask these questions:
– Do you want insights on your
customer’s behavior?
– Do you need real-time customer
transactional information?
– Do you need historical data or just
access to the latest transactions?
– Where do you go to find the single
version of the truth about your
customers?
Copyright 2013 by Data Blueprint
94