The document discusses the concept of "dark data", which refers to data that is collected by organizations but not analyzed or used. Some key points:
- Up to 90% of data loses value immediately or is never analyzed by organizations. Common examples of dark data include customer location data and sensor data.
- Organizations retain dark data for compliance purposes but storing it can be more expensive than the potential value. Only about 1% of organizational data is typically analyzed.
- Dark data poses risks like legal issues if it contains private information, but also opportunity costs if competitors analyze the data first. Methods to mitigate risks include ongoing data inventories, encryption, and retention policies.
- Many types of businesses could benefit from analyzing
Data mining Course
Chapter 2: Data preparation and processing
Introduction
Domain Expert
Goal identification and Data Understanding
Data Cleaning
Missing values
Noisy Data
Inconsistent Data
Data Integration
Data Transformation
Data Reduction
Feature Selection
Sampling
Discretization
Paul Messina from Argonne presented this deck at the HPC User Forum in Santa Fe.
"The Exascale Computing Project (ECP) was established with the goals of maximizing the benefits of high-performance computing (HPC) for the United States and accelerating the development of a capable exascale computing ecosystem. Exascale refers to computing systems at least 50 times faster than the nation’s most powerful supercomputers in use today.The ECP is a collaborative effort of two U.S. Department of Energy organizations – the Office of Science (DOE-SC) and the National Nuclear Security Administration (NNSA)."
Watch the video: http://insidehpc.com/2017/04/update-exascale-computing-project-ecp/
Learn more: https://exascaleproject.org/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Mobile devices, sensors, and GPSs are driving demand to handle big data in both batch and real time. This presentation discusses how we used complex event processing (CEP) and MapReduce-based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge. In 2013, the challenge included a data set generated by a real soccer match in which sensors were placed in the soccer ball and players’ shoes. This session will review how we used CEP to address the DEBS challenge and achieve throughput in excess of 100,000 events/sec. It will also examine how we extended the solution to conduct batch processing with business activity monitoring (BAM) using the same framework, enabling users to obtain both instant analytics as well as more detailed batch processing-based results.
eBook: Guide to Data Center Cabling Infrastructure
In This Free 36-page eBook:
*10 Gb/s Data Center Solutions
*Best Practices for Data Center Infrastructure Design
*Comparing Copper and Fiber Options in the Data Center
*The Hidden Costs of 10 Gb/s UTP Systems
*Light it Up: Fiber *Transmissions and Applications
*Cabling Infrastructure and Green Building Initiatives
About the Author:
Carrie Higbie has been involved in the computing and networking for 25+ years in executive and consultant roles. She is Siemon’s Global Network Applications Manager supporting end-users and active electronics manufacturers. She publishes columns and speaks at industry events globally. Carrie is an expert on TechTarget’s SearchNetworking, SearchVoIP, and SearchDataCenters and authors columns for these and SearchCIO and SearchMobile forums and is on the board of advisors. She is on the BOD and former President of the BladeSystems Alliance. She participates in IEEE, the Ethernet Alliance and IDC Enterprise Expert Panels. She has one telecommunications patent and one pending.
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
Get your data to life with Power BI visualization and insights!
With the changing landscape of Power BI features it is essential to get hold of configuration and deployment practices within your data platform that will ensure you are on-par with compliance & security practices. In this session we will overview from the basics leading into advanced tricks on this landscape:
How to deploy Power BI?
How to implement configuration parameters and package BI features as a part of Office 365 roll out in your organisation?
What are newest features and enhancements on this Power BI landscape?
How to manage on-premise vs on-cloud connectivity?
How can you help and support the Power BI community as well?
Having said that within the objectives of this session, cloud computing is another aspect of this technology made is possible to get data within few clicks and ticks to the end-user. Let us review how to manage & connect on-premise data to cloud capabilities that can offer full advantage of data catalogue capabilities by keeping data secure as per Information Governance standards. Not just with nuts and bolts, performance is another aspect that every Admin is keeping up, let us look into few settings on how to maximize performance to optimize access to data as required. Gain understanding and insight into number of tools that are available for your Business Intelligence needs. There will be a showcase of events to demonstrate where to begin and how to proceed in BI world.
- D BI A Consulting
consulting@dbia.uk
4 steps to quickly improve pue through airflow managementUpsite Technologies
It’s well known that cooling typically accounts for around half of a data center's total power consumption. Given this, it's imperative that cooling is optimized to achieve a low Power Usage Effectiveness (PUE). While this too may be common knowledge, the question still remains, how can this be done quickly, with all possible benefits realized, and with the fastest return on investment?
Data mining Course
Chapter 2: Data preparation and processing
Introduction
Domain Expert
Goal identification and Data Understanding
Data Cleaning
Missing values
Noisy Data
Inconsistent Data
Data Integration
Data Transformation
Data Reduction
Feature Selection
Sampling
Discretization
Paul Messina from Argonne presented this deck at the HPC User Forum in Santa Fe.
"The Exascale Computing Project (ECP) was established with the goals of maximizing the benefits of high-performance computing (HPC) for the United States and accelerating the development of a capable exascale computing ecosystem. Exascale refers to computing systems at least 50 times faster than the nation’s most powerful supercomputers in use today.The ECP is a collaborative effort of two U.S. Department of Energy organizations – the Office of Science (DOE-SC) and the National Nuclear Security Administration (NNSA)."
Watch the video: http://insidehpc.com/2017/04/update-exascale-computing-project-ecp/
Learn more: https://exascaleproject.org/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Mobile devices, sensors, and GPSs are driving demand to handle big data in both batch and real time. This presentation discusses how we used complex event processing (CEP) and MapReduce-based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge. In 2013, the challenge included a data set generated by a real soccer match in which sensors were placed in the soccer ball and players’ shoes. This session will review how we used CEP to address the DEBS challenge and achieve throughput in excess of 100,000 events/sec. It will also examine how we extended the solution to conduct batch processing with business activity monitoring (BAM) using the same framework, enabling users to obtain both instant analytics as well as more detailed batch processing-based results.
eBook: Guide to Data Center Cabling Infrastructure
In This Free 36-page eBook:
*10 Gb/s Data Center Solutions
*Best Practices for Data Center Infrastructure Design
*Comparing Copper and Fiber Options in the Data Center
*The Hidden Costs of 10 Gb/s UTP Systems
*Light it Up: Fiber *Transmissions and Applications
*Cabling Infrastructure and Green Building Initiatives
About the Author:
Carrie Higbie has been involved in the computing and networking for 25+ years in executive and consultant roles. She is Siemon’s Global Network Applications Manager supporting end-users and active electronics manufacturers. She publishes columns and speaks at industry events globally. Carrie is an expert on TechTarget’s SearchNetworking, SearchVoIP, and SearchDataCenters and authors columns for these and SearchCIO and SearchMobile forums and is on the board of advisors. She is on the BOD and former President of the BladeSystems Alliance. She participates in IEEE, the Ethernet Alliance and IDC Enterprise Expert Panels. She has one telecommunications patent and one pending.
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
Get your data to life with Power BI visualization and insights!
With the changing landscape of Power BI features it is essential to get hold of configuration and deployment practices within your data platform that will ensure you are on-par with compliance & security practices. In this session we will overview from the basics leading into advanced tricks on this landscape:
How to deploy Power BI?
How to implement configuration parameters and package BI features as a part of Office 365 roll out in your organisation?
What are newest features and enhancements on this Power BI landscape?
How to manage on-premise vs on-cloud connectivity?
How can you help and support the Power BI community as well?
Having said that within the objectives of this session, cloud computing is another aspect of this technology made is possible to get data within few clicks and ticks to the end-user. Let us review how to manage & connect on-premise data to cloud capabilities that can offer full advantage of data catalogue capabilities by keeping data secure as per Information Governance standards. Not just with nuts and bolts, performance is another aspect that every Admin is keeping up, let us look into few settings on how to maximize performance to optimize access to data as required. Gain understanding and insight into number of tools that are available for your Business Intelligence needs. There will be a showcase of events to demonstrate where to begin and how to proceed in BI world.
- D BI A Consulting
consulting@dbia.uk
4 steps to quickly improve pue through airflow managementUpsite Technologies
It’s well known that cooling typically accounts for around half of a data center's total power consumption. Given this, it's imperative that cooling is optimized to achieve a low Power Usage Effectiveness (PUE). While this too may be common knowledge, the question still remains, how can this be done quickly, with all possible benefits realized, and with the fastest return on investment?
There are many factors in the data center that are driving the new data center design considerations. This slideshare discusses several of the trends in the data center and covers several solutions to implement.
Download at http://DavidHubbard.net/powerpoint - This Introduction to Business Intelligence gives an overview of how Business Intelligence fits into business strategy in general. It does not go into the specific technologies of Business Intelligence. It is meant to be used to explain Business Intelligence to those not already familiar with Business Intelligence.
In a world of Global Value Chains, understanding MNEs – where they are, how they operate, where they pay taxes – has never been more important. However, surprisingly little official statistics are currently available on individual MNEs.
To fill this gap the OECD has begun to develop a new database – the Analytical Database on Individual Multinationals and Affiliates (ADIMA) – using a number of open big data sources that can provide new insights on individual MNEs and their global profiles.
The presentation discusses the different aspects of Power BI like Power BI for O365, Data Discovery, Data Analysis, Data Visualization & Power Maps, Natural Language Search etc.
Its a business analytics solution presented by Netwoven at the Microsoft Power BI workshop held on Oct 30th at SVC Microsoft, Mountain View.
Location Intelligence - The where factorThomas Lejars
Taking the location context into account when analyzing business data allows for revealing spatial relationships, trends, dependencies, and patterns that would have been undetectable in a traditional enterprise applications or BI. Location is a central factor in business. Almost all business data have a location; a customer’s address, a store’s location, the competitor stores, a sales territory, a delivery route or an administrative boundary etc. Location awareness has a high importance for performance management.
Dark Data Revelation and its Potential BenefitsPromptCloud
This presentation covers benefits, use cases, practical examples, potential issues and the approach that needs to be taken when it comes to harnessing the power of dark data (a largely untapped strategic play in the big data realm).
Master Data in the Cloud: 5 Security FundamentalsSarah Fane
Your master data is essential to the smooth operation of your business. But it is also valuable to others. Master data is vulnerable to both internal and external attacks. As the future of business and data is increasingly cloud-based, we explore five fundamentals to ensure the security of your data.
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
With recent studies indicating that 80% of AI and machine learning projects are failing due to data quality related issues, it’s critical to think holistically about this fact. This is not a simple topic – issues in data quality can occur throughout from starting the project through to model implementation and usage.
View this webinar on-demand, where we start with four foundational data steps to get our AI and ML projects grounded and underway, specifically:
• Framing the business problem
• Identifying the “right” data to collect and work with
• Establishing baselines of data quality through data profiling and business rules
• Assessing fitness for purpose for training and evaluating the subsequent models and algorithms
What should organizations be concerned about when using Machine Learning for Predictive Modeling techniques? Divergence Academy and Divergence.AI are leading efforts to bring Algorithmic Accountability awareness to masses.
Hidden security and privacy consequences around mobility (Infosec 2013)Huntsman Security
An overview of the security and privacy implications and risks resulting from the wider adoption of mobile devices, apps, cloud and the resultant changes to customer interaction and business processes
Regulatory Control functions, such as Operational Risk, Compliance and Audit, increasingly raise questions around the scope, management, and identification of sensitive data within distributed and mainframe application environments.
Today's security and privacy professionals know that breaches are a fact of life. Yet their organizations are often not prepared to respond when the time comes. They're "overweight" on prevention and detection, but "underweight" on response.
Based on a decade-plus caseload of actual breach investigations across of range of different organizations, this webinar will examine an amalgamated, anonymized breach situation and review a play-by-play of how the response went: the good, the bad, and the ugly. Attendees will gain hard-earned, battle-tested insight on what to do, and what to avoid when it's their turn to respond to an incident.
Our featured speakers for this timely webinar will be:
- Don Ulsch, CEO, ZeroPoint Risk. Distinguished Fellow at the Ponemon Institute.
- Joseph DeSalvo, Managing Director, ZeroPoint Risk. Former CSO at Mylan and Iron Mountain.
- Ted Julian, Chief Marketing Officer, Co3 Systems. Serial security and compliance entrepreneur.
Data privacy awareness is on the rise. Users become more and more concerned with how online service providers collect and protect their personal information. And so should you. Discover how to balance the risks and benefits of collecting data in the age of customer centricity.
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxSteveNgigi2
The data protection impact assessment for a cloud based project aims to provide financial inclusion for the unbanked population through its three modules, i.e., wallet, social banking and marketplace/business hub. The primary goal is to enable individuals without access to traditional banking services to engage in financial transactions.
The processing involves the collection, storage, and utilization of personal data for various purposes, such as creating digital wallets, facilitating social banking interactions, and delivering targeted marketing content. The platform will manage user information to enable secure and seamless financial transactions.
The targeted data subjects are individuals and entities within the unbanked population who lack access to traditional financial services. These individuals include low-income earners, marginalized communities and those residing in areas with limited banking infrastructure.
The primary class of data subjects includes the unbanked population seeking financial inclusion. Within this group, there may be subcategories, such as individuals with limited financial literacy or those residing in remote areas, and any vulnerable groups, such as elderly users or minors, who are part of the targeted data subjects.
To implement data-centric security, while simultaneously empowering your business to compete and win in today’s nano-second world, you need to understand your data flows and your business needs from your data. Begin by answering some important questions:
•
What does your organization need from your data in order to extract the maximum business value and gain a competitive advantage?
•
What opportunities might be leveraged by improving the security posture of the data?
•
What risks exist based upon your current security posture? What would the impact of a data breach be on the organization? Be specific!
•
Have you clearly defined which data (both structured and unstructured) residing across your extended enterprise is most important to your business? Where is it?
•
What people, processes and technology are currently employed to protect your business sensitive information?
•
Who in your organization requires access to data and for what specific purposes?
•
What time constraints exist upon the organization that might affect the technical infrastructure?
•
What must you do to comply with the myriad government and industry regulations relevant to your business?
Finally, ask yourself what a successful data-centric protection program should look like in your organization. What’s most appropriate for your organization?
The answers to these and other related questions would provide you with a clearer picture of your enterprise’s “data attack surface,” which in turn will provide you with a well-documented risk profile. By answering these questions and thinking holistically about where your data is, how it’s being used and by whom, you’ll be well positioned to design and implement a robust, business-enabling data-centric protection plan that is tailored to the unique requirements of your organization.
Extract the Analyzed Information from Dark Dataijtsrd
The world is surrounded by data and data, the data may be structured, unstructured, or semi structured every organization generates enormous data daily, only the tip of data is analyzed, and the larger the data is ignored from the utilizable analysis. This paper focuses on a particularly unstructured and bothersome class of data, termed Dark data. Dark data is not attentively analyzed, indexed, and stored, so it becomes nearly imperceptible to potential users and therefore is more likely to last neutralized and eventually lost. This paper discusses how the concepts of long term specifically use of analyzed for all intents and purposes dark data can be used to generally understand the very possible solutions for better curation of dark data in a major way. This paper describes why this class of data is so critical to scientific progress, some of the properties of this dark data, as well as the technical difficulties to useful management of this class of data. Many probable useful institutional and technical solutions are under development which will show in this paper in the last section, but these solutions are mainly conceptual and require additional research during lack of resources. Rahul P | Ganeshan M "Extract the Analyzed Information from Dark Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30842.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-processing/30842/extract-the-analyzed-information-from-dark-data/rahul-p
Cybersecurity has become an important issue for today's businesses. This presentation will review current scams and fraud, how to develop a plan to keep your business safe and secure, tips and resources.
There are many factors in the data center that are driving the new data center design considerations. This slideshare discusses several of the trends in the data center and covers several solutions to implement.
Download at http://DavidHubbard.net/powerpoint - This Introduction to Business Intelligence gives an overview of how Business Intelligence fits into business strategy in general. It does not go into the specific technologies of Business Intelligence. It is meant to be used to explain Business Intelligence to those not already familiar with Business Intelligence.
In a world of Global Value Chains, understanding MNEs – where they are, how they operate, where they pay taxes – has never been more important. However, surprisingly little official statistics are currently available on individual MNEs.
To fill this gap the OECD has begun to develop a new database – the Analytical Database on Individual Multinationals and Affiliates (ADIMA) – using a number of open big data sources that can provide new insights on individual MNEs and their global profiles.
The presentation discusses the different aspects of Power BI like Power BI for O365, Data Discovery, Data Analysis, Data Visualization & Power Maps, Natural Language Search etc.
Its a business analytics solution presented by Netwoven at the Microsoft Power BI workshop held on Oct 30th at SVC Microsoft, Mountain View.
Location Intelligence - The where factorThomas Lejars
Taking the location context into account when analyzing business data allows for revealing spatial relationships, trends, dependencies, and patterns that would have been undetectable in a traditional enterprise applications or BI. Location is a central factor in business. Almost all business data have a location; a customer’s address, a store’s location, the competitor stores, a sales territory, a delivery route or an administrative boundary etc. Location awareness has a high importance for performance management.
Dark Data Revelation and its Potential BenefitsPromptCloud
This presentation covers benefits, use cases, practical examples, potential issues and the approach that needs to be taken when it comes to harnessing the power of dark data (a largely untapped strategic play in the big data realm).
Master Data in the Cloud: 5 Security FundamentalsSarah Fane
Your master data is essential to the smooth operation of your business. But it is also valuable to others. Master data is vulnerable to both internal and external attacks. As the future of business and data is increasingly cloud-based, we explore five fundamentals to ensure the security of your data.
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
With recent studies indicating that 80% of AI and machine learning projects are failing due to data quality related issues, it’s critical to think holistically about this fact. This is not a simple topic – issues in data quality can occur throughout from starting the project through to model implementation and usage.
View this webinar on-demand, where we start with four foundational data steps to get our AI and ML projects grounded and underway, specifically:
• Framing the business problem
• Identifying the “right” data to collect and work with
• Establishing baselines of data quality through data profiling and business rules
• Assessing fitness for purpose for training and evaluating the subsequent models and algorithms
What should organizations be concerned about when using Machine Learning for Predictive Modeling techniques? Divergence Academy and Divergence.AI are leading efforts to bring Algorithmic Accountability awareness to masses.
Hidden security and privacy consequences around mobility (Infosec 2013)Huntsman Security
An overview of the security and privacy implications and risks resulting from the wider adoption of mobile devices, apps, cloud and the resultant changes to customer interaction and business processes
Regulatory Control functions, such as Operational Risk, Compliance and Audit, increasingly raise questions around the scope, management, and identification of sensitive data within distributed and mainframe application environments.
Today's security and privacy professionals know that breaches are a fact of life. Yet their organizations are often not prepared to respond when the time comes. They're "overweight" on prevention and detection, but "underweight" on response.
Based on a decade-plus caseload of actual breach investigations across of range of different organizations, this webinar will examine an amalgamated, anonymized breach situation and review a play-by-play of how the response went: the good, the bad, and the ugly. Attendees will gain hard-earned, battle-tested insight on what to do, and what to avoid when it's their turn to respond to an incident.
Our featured speakers for this timely webinar will be:
- Don Ulsch, CEO, ZeroPoint Risk. Distinguished Fellow at the Ponemon Institute.
- Joseph DeSalvo, Managing Director, ZeroPoint Risk. Former CSO at Mylan and Iron Mountain.
- Ted Julian, Chief Marketing Officer, Co3 Systems. Serial security and compliance entrepreneur.
Data privacy awareness is on the rise. Users become more and more concerned with how online service providers collect and protect their personal information. And so should you. Discover how to balance the risks and benefits of collecting data in the age of customer centricity.
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxSteveNgigi2
The data protection impact assessment for a cloud based project aims to provide financial inclusion for the unbanked population through its three modules, i.e., wallet, social banking and marketplace/business hub. The primary goal is to enable individuals without access to traditional banking services to engage in financial transactions.
The processing involves the collection, storage, and utilization of personal data for various purposes, such as creating digital wallets, facilitating social banking interactions, and delivering targeted marketing content. The platform will manage user information to enable secure and seamless financial transactions.
The targeted data subjects are individuals and entities within the unbanked population who lack access to traditional financial services. These individuals include low-income earners, marginalized communities and those residing in areas with limited banking infrastructure.
The primary class of data subjects includes the unbanked population seeking financial inclusion. Within this group, there may be subcategories, such as individuals with limited financial literacy or those residing in remote areas, and any vulnerable groups, such as elderly users or minors, who are part of the targeted data subjects.
To implement data-centric security, while simultaneously empowering your business to compete and win in today’s nano-second world, you need to understand your data flows and your business needs from your data. Begin by answering some important questions:
•
What does your organization need from your data in order to extract the maximum business value and gain a competitive advantage?
•
What opportunities might be leveraged by improving the security posture of the data?
•
What risks exist based upon your current security posture? What would the impact of a data breach be on the organization? Be specific!
•
Have you clearly defined which data (both structured and unstructured) residing across your extended enterprise is most important to your business? Where is it?
•
What people, processes and technology are currently employed to protect your business sensitive information?
•
Who in your organization requires access to data and for what specific purposes?
•
What time constraints exist upon the organization that might affect the technical infrastructure?
•
What must you do to comply with the myriad government and industry regulations relevant to your business?
Finally, ask yourself what a successful data-centric protection program should look like in your organization. What’s most appropriate for your organization?
The answers to these and other related questions would provide you with a clearer picture of your enterprise’s “data attack surface,” which in turn will provide you with a well-documented risk profile. By answering these questions and thinking holistically about where your data is, how it’s being used and by whom, you’ll be well positioned to design and implement a robust, business-enabling data-centric protection plan that is tailored to the unique requirements of your organization.
Extract the Analyzed Information from Dark Dataijtsrd
The world is surrounded by data and data, the data may be structured, unstructured, or semi structured every organization generates enormous data daily, only the tip of data is analyzed, and the larger the data is ignored from the utilizable analysis. This paper focuses on a particularly unstructured and bothersome class of data, termed Dark data. Dark data is not attentively analyzed, indexed, and stored, so it becomes nearly imperceptible to potential users and therefore is more likely to last neutralized and eventually lost. This paper discusses how the concepts of long term specifically use of analyzed for all intents and purposes dark data can be used to generally understand the very possible solutions for better curation of dark data in a major way. This paper describes why this class of data is so critical to scientific progress, some of the properties of this dark data, as well as the technical difficulties to useful management of this class of data. Many probable useful institutional and technical solutions are under development which will show in this paper in the last section, but these solutions are mainly conceptual and require additional research during lack of resources. Rahul P | Ganeshan M "Extract the Analyzed Information from Dark Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30842.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-processing/30842/extract-the-analyzed-information-from-dark-data/rahul-p
Cybersecurity has become an important issue for today's businesses. This presentation will review current scams and fraud, how to develop a plan to keep your business safe and secure, tips and resources.
Is Bad Data Killing Your Customer Engagement Strategy? Marketo
In this webinar, hear how Marketo and AmberLeaf are helping other marketing teams improve customer engagement by improving customer data. Listen in to learn
- How your customers view your data challenges
- Who should own the data problem
- Common data pitfalls and quick wins for clean up
- What questions to ask of other internal groups
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
11. Turning Dark
• Useful data may become dark data after it becomes irrelevant, as it is not
processed fast enough. This is called "perishable insights" in "live flowing
data".
• For example, geolocation of a customer, fraud detection
• According to IBM, about 60 percent of data loses its value immediately.
• IBM estimate that roughly 90 percent of data generated by sensors and
analog-to-digital conversions never get used.
• Not analysing data immediately and letting it go 'dark' can lead to
significant losse
• Not only must processed fast enough but also must act quick enough
12. Turning DARK
• Organizations retain dark data for a multitude
of reasons, and it is estimated that most
companies are only analyzing 1% of their data.
• A lot of dark data is unstructured, which means
that the information is in formats that may be
difficult to categorise, be read by the computer
and thus analysed.
• Often the reason that business do not analyse
their dark data is because of the amount of
resources it would take and the difficulty of
having that data analysed.
• Because storage is inexpensive, storing data is
easy. However, storing and securing the data
usually entails greater expenses (or even risk)
than the potential return profit.
13. Why dark data is handled the way it is?
• It is surprising because at the time of data collection, the companies
assume that the data is going to provide value. Companies invest a lot
on data collection so both monetarily and otherwise, data should be
considered important. Here are a few reasons why there is so much
of dark data
14. Why dark data is handled the way it is?
1. Lopsided priorities data on how the customer arrived at the
application page.
2. Disconnect among departments may not be known to other
departments. This is the way we do it here
3. Technology and tool constraints If data collection is done by
separate technologies and tools in the same organization, it may be
difficult to integrate audio file contents from call center with click
data from websites.
15. Shed some light on the DARK
Gartner defines dark data as
• the information assets organizations
• collect,
• process
• and store during regular business activities,
• but generally fail to use for other purposes (for example, analytics,
business relationships and direct monetizing).
• In an industrial context, dark data can include information gathered by
sensors and telematics.
• Similar to dark matter in physics, dark data often comprises most
organizations’ universe of information assets.
• Thus, organizations often retain dark data for compliance purposes only.
Storing and securing data typically incurs more expense (and sometimes
greater risk) than value.
16. Dark Data Example:
IP Location
• a manufacturer of soft drinks which runs a popular website might
think that, of all the data that they have, only those that are
directly relevant to the marketing and sales of their soft drink
products have any value for them. While they also store many
other data, such as the IP location of their users, they fail to see
how these "dark" data can also have value to their company.
• Yet if their data, properly cleansed to a high quality and then
analysed, reveal that 7% of the users of their website are
accessing their service from outside the country where they are
located, in spite of the fact that the product is only directly sold
to retailers within that country, these are in themselves valuable
data, for instance, to those who target ads at users of soft drinks.
• These dark data could also be seen as an opportunity to think
about marketing their product elsewhere. For instance, if 40% of
users from outside the country where the company was located
access their site from India, according to the IP location data,
while only 4% came from the European Union, it would strongly
suggest that a marketing campaign within Europe would have
considerably less chance of success than one aimed at the Indian
sub-continent.
17. Other Dark Data Examples:
type of device
• Other typical examples of dark data, which
most websites store, but fail to utilize the
value of, include the type of device one
accesses the Internet from, typically a
smartphone, tablet or computer; the web-
browser the Internet is being accessed
through, eg Chrome, Mozilla, Opera, Edge
or IE, among others, and even more
obscure or dark information such as the
number of times users re-set their
password, which would be useful to a
company which specializes in Internet and
password security.
18. Other Dark Data Examples:
Customer Feedback
• A well-known example of dark data which
goes to waste is where companies have a
feedback form which allows users to give
feedback concerning their website or
service but then they don't have the data
structures in place which allow these data
to be easily analysed, resulting in a failure
to take on board and act on their users
judgments and criticisms, whether positive
or negative (both of which have value), that
users make about their site or service.
22. More dark data examples
• Customer Information
• Log Files
• Account Information
• Previous Employee Data
• Financial Statements
• Raw Survey Data
• Email Correspondences
• Notes or Presentations
• Old Versions of Relevant Documents
23. The dark bites
• Maybe we can use all this data later? This also explains why many
organizations are reluctant to part with dark data, even if they have
no plans to put it to work on their behalf, either in the near term or
further down the planning horizon.
• The dark can bite, organizations must also be aware that the dark
data they possess – or perhaps more chillingly,
• the dark data about them, their customers and their operations that's stored
in the cloud, outside their immediate control and management – can pose
risks to their continued business health and well-being.
24. Problems from the dark
• Data stored but not used cost money ( NYT says 90% of energy used
by data centers is waster)
• Stored data costs money, according to Datamation by 2020 unused
but stored can add up to $891 billion
• The more data is stored but not used, the higher the risk specially in
privacy
25. The risks
1. Legal and regulatory risk. If data covered by mandate or regulation
– such as confidential, financial information (credit card or other
account data) or patient records – appears anywhere in dark data
collections, its exposure could involve legal and financial liability.
2. Intelligence risk. If dark data encompasses proprietary or sensitive
information reflective of business operations, practices, competitive
advantages, important partnerships and joint ventures, and so
forth, inadvertent disclosure could adversely affect the bottom line
or compromise important business activities and relationships.
26. The risks
3. Reputation risk. Any kind of data breach reflects badly on the
organizations affected thereby. This applies as much to dark data
(especially in light of other risks) as to other kinds of breaches.
4. Opportunity costs. Given that the organization has decided not to invest
in analysis and mining of dark data by definition, concerted efforts by
third parties to exploit its value represent potential losses of intelligence
and value based upon its contents.
5. Open-ended exposure. By definition, dark data contains information
that's either too difficult or costly to extract to be mined, or that contains
unknown (and therefore unevaluated) sources of intelligence and
exposure to loss or harm. Dark data's secrets may be very dark and
damaging indeed, but one has no way of knowing for sure.
27. Mitigating Risks Posed by Dark Data
1. Know where is dark---ongoing inventory and assessment.
2. Turn dark to light ---drive ongoing research into new tools and
technologies
3. understanding where dark data resides, how it's stored, how it's
protected and what kinds of access controls help maintain its security.
4. No man land –Ubiquitous encryption.No dark data should be readily
accessible to casual inspection, under any circumstances.
5. Don’t stay in the dark too long ---Retention policies and safe disposal.
6. Auditing dark data for security purposes.
28. What are some other major areas in which dark
data is being underutilized besides underutilized
customer information?
• Education and Healthcare.
• The potential to service students and patients in the manner in which
the consumer and financial services pursue their target population is
huge.
• So much paperwork is involved in both education and academics, so
the data is there—and in the age of electronic health records
government incentives, much of it in the healthcare space is now
digital.
• However, it needs to be mined and analyzed in order to lead to
opportunities that effect the change which usually results from the
strategic use of personal and behavioral data.
29. What kind of businesses can really benefit
from dark data extraction and processing?
• Business that sells a product, service or idea—anyone who has
customers—can benefit.
• How many times
• a user resets their password IP address when a user logs into your
website/app
• Last email communication date to your customers
• Mobile handset type, or web browser version
• Free text feedback on a hotel stay or recent flight
• Additional passengers or guest names on a ticket or hotel room
• These data points or features are often overlooked by marketing
teams as serving any useful purpose, as there is a perception that this
type of information is only collected for compliance, fraud or
regulatory requirements
30. How old is too old when it comes to dark
data?
• Nothing is ever too old unless it is too old
• That said, if you’re analyzing, say, customer sentiment in social media,
you simply won’t have relevant data that predates the advent of
social channels. So in that case, dark data from before those channels
existed could be considered “too old.”
31. How can you turn dark data into active,
revenue generating data?
• This is where data science, marketing, and business intelligence need to get their
heads together to find new ways of activating dark data to provide new
opportunities for the organization. While dark data can appear dull and
uninteresting on the surface; there are methods to turn it into highly granular,
rich customer insights.
• Here are a few key steps to get you started on the above examples:
• Log-ins to your website or mobile application, what city/country are the IP
addresses? Are you logging each location a user visit and creating a virtual map of
their travels? This is particularly compelling when creating a 360-degree view of
your customer.
• Additional passengers/guest names on a reservation. Not only does this give
insight to homophily of the user and fuel your social network graph of which
users are centrally connected and influential, but it also provides rich insight into
their family and workplace. Link this data with social graphing, and you’ll quickly
obtain age, gender, and behavioral traits.
32. How can you turn dark data into active,
revenue generating data?
• Mobile phone data. This simple piece of data will illuminate an array of
new product and marketing opportunities, and provide an additional
segmentation layer to improve marketing effectiveness. From mobile
phone data, it’s possible to know which telco partners you should bring on
board (which will activate even MORE opportunities), you’ll know where
your users are in the world, in real time if they have recently purchased
tickets with another airline, and more.
• Free text input, such as feedback can be passed through cognitive text
analysis tools to determine if the general sentiment of the feedback is
positive of negative. Linking the user profile to your internal database can
also determine if this user is sending mixed messages on social media
compared to surveys and feedback forms. THINK AIRLINE
33. Four Ways to Use Dark Data
1. Networking machine data. As noted above, servers, firewalls, network
monitoring tools and other parts of your environment generate large
amounts of machine data related to network operations. Avoid dark
networking data by using this information to analyze network security, as
well as to monitor network activity patterns to ensure that your network
infrastructure is never under- or over-utilized.
2. Customer support logs. Most businesses maintain records of customer-
support interactions that include information such as when a customer
contacted the business, which type of communication channel was used,
how long the engagement lasted and so on. Don’t make the mistake of
leaving this data in the dark, or using it only when you need to research a
customer issue. Instead, build it into your analytics workflows by
leveraging it to help understand when your customers are most likely to
contact you, what their preferred methods of contact are and so on.
34. Four Ways to Use Dark Data
3. “Legacy” system log. If you have mainframes or other older types of
systems running in your environment, you may think that there is no way
to use modern analytics tools to understand them. But you can. By
offloading system logs and other data from these systems into an
analytics platform like Hadoop, you can make sure you are not leaving
this “legacy” data in the dark.
4. Non-textual data. Most data analytics workflows are built around textual
data, which is easier to ingest. You can also make use of video, audio or
other non-textual files, however. You can analyze the meta data
associated with them, or, if appropriate, translate speech to text in order
to gain more insight into the content of the data itself. The effort
required in this regard may not be worth it in all cases, but the bigger
point worth keeping in mind is that your non-textual data doesn’t have
to be dark data. There are ways to make it actionable if you need it to be.
35. LET THERE BE LIGHT: Dark Data Analytics
• Dark analytics efforts typically focus on three dimensions:
1. Untapped data already in your possession
2. Nontraditional unstructured data
3. Data in the deep web
• o be clear, the purpose of dark analytics is not to catalog vast volumes of
unstructured data. Casting a broader data net without a specific purpose in
mind will likely lead to failure. Indeed, dark analytics efforts that are
surgically precise in both intent and scope often deliver the greatest value.
Like every analytics journey, successful efforts begin with a series of
specific questions. What problem are you solving? What would we do
differently if we could solve that problem? Finally, what data sources and
analytics capabilities will help us answer the first two questions?
36. DeepDive
• http://deepdive.stanford.edu/quickstart
• DeepDive is a system to extract value from dark data. Like dark matter, dark data
is the great mass of data buried in text, tables, figures, and images, which lacks
structure and so is essentially unprocessable by existing software.
• DeepDive helps bring dark data to light by creating structured data (SQL tables)
from unstructured information (text documents) and integrating such data with
an existing structured database.
• DeepDive is used to extract sophisticated relationships between entities and
make inferences about facts involving those entities.
• DeepDive helps one process a wide variety of dark data and put the results into a
database. With the data in a database, one can use a variety of standard tools
that consume structured data; e.g., visualization tools like Tablaeu or analytics
tools like Excel.
• http://deepdive.stanford.edu/showcase/apps
37. Lessons from the front lines
• IU HEALTH’S RX FOR MINING DARK DATA
• Retailers make it personal
• Oil Company
38. IU HEALTH’S RX FOR MINING DARK DATA
• As part of a new model of care, Indiana
University Health (IU Health) is exploring
ways to use nontraditional and unstructured
data to personalize health care for
individual patients and improve overall
health outcomes for the broader
population.
• Traditional relationships between medical
care providers and patients are often
transactional in nature, focusing on
individual visits and specific outcomes
rather than providing holistic care services
on an ongoing basis. IU Health has
determined that incorporating insights from
additional data will help build patient
loyalty and provide more useful, seamless,
and cost-efficient care.
39. IU HEALTH’S RX FOR MINING DARK DATA
• “IU Health needs a 360-degree understanding of the
patients it serves in order to create the kind of care and
services that will keep them in the system
• For example, consider the voluminous free-form notes—
both written and verbal—that physicians generate
during patient consultations.
• Deploying voice recognition, deep learning, and text
analysis capabilities to these in-hand but previously
underutilized sources could potentially add more depth
and detail to patient medical records.
• These same capabilities might also be used to analyze
audio recordings of patient conversations with IU Health
call centers to further enhance a patient’s records. Such
insights could help IU Health develop a more thorough
understanding of the patient’s needs, and better
illuminate how those patients utilize the health system’s
services.
40. IU HEALTH’S RX FOR MINING DARK DATA
• Another opportunity involves using dark data to help predict need and manage care
across populations. IU Health is examining how cognitive computing, external data, and
patient data could help identify patterns of illness, health care access, and historical
outcomes in local populations. The approaches could make it possible to incorporate
socioeconomic factors that may affect patients’ engagement with health care providers.
• “There may be a correlation between high density per living unit and disengagement
from health,” says Mark Lantzy, senior vice president and chief information officer, IU
Health. “It is promising that we can augment patient data with external data to
determine how to better engage with people about their health. We are creating the
underlying platform to uncover those correlations and are trying to create something
more systemic.
• The destination for our journey is an improved patient experience,” he continues.
“Ultimately, we want it to drive better satisfaction and engagement. More than deliver
great health care to individual patients, we want to improve population health
throughout Indiana as well. To be able to impact that in some way, even incrementally,
would be hugely beneficial.”
41. Retailers make it personal
• Retailers almost universally recognize that digital has reshaped customer
behavior and shopping. In fact, $0.56 of every dollar spent in a store is
influenced by a digital interaction.
• Yet many retailers—particularly those with brick-and-mortar operations—
still struggle to deliver the digital experiences customers expect. Some
focus excessively on their competitors instead of their customers and rely
on the same old key performance indicators and data.
• In recent years, however, growing numbers of retailers have begun
exploring different approaches to developing digital experiences. Some are
analyzing previously dark data culled from customers’ digital lives and
using the resulting insights to develop merchandising, marketing, customer
service, and even product development strategies that offer shoppers a
targeted and individualized customer experience.
42. Retailers make it personal
• Stitch Fix, for example, is an online subscription shopping service that
uses images from social media and other sources to track emerging
fashion trends and evolving customer preferences.
• Its process begins with clients answering a detailed questionnaire
about their tastes in clothing. Then, with client permission, the
company’s team of 60 data scientists augments that information by
scanning images on customers’ Pinterest boards and other social
media sites, analyzing them, and using the resulting insights to a
develop a deeper understanding of each customer’s sense of style.
• Company stylists and artificial intelligence algorithms use these
profiles to select style-appropriate items of clothing to be shipped to
individual customers at regular intervals.
43. Retailers make it personal
• Meanwhile, grocery supermarket chain Kroger Co. is taking a different
approach that leverages Internet of Things and advanced analytics
techniques. As part of a pilot program, the company is embedding a
network of sensors and analytics into store shelves that can interact
with the Kroger app and a digital shopping list on a customer’s phone.
• As the customer strolls down each aisle, the system—which contains
a digital history of the customer’s purchases and product
preferences—can spotlight specially priced products the customer
may want on 4-inch displays mounted in the aisles. This pilot, which
began in late 2016 with initial testing in 14 stores, is expected to
expand in 2017.
44. GREG POWERS, VICE PRESIDENT OF TECHNOLOGY,
HALLIBURTON
• Yet the sheer volume of information that we can and do collect goes way
beyond human cognitive bandwidth. Advances in sensor science are
delivering enormous troves of both dark data and what I think of as really
dark data.
• For example, we scan rocks electromagnetically to determine their
consistency. We use nuclear magnetic resonance to perform what amounts
to an MRI on oil wells. Neutron and gamma-ray analysis measures the
electrical permittivity and conductivity of rock. Downhole spectroscopy
measures fluids. Acoustic sensors collect 1–2 terabytes of data daily.
• All of this dark data helps us better understand in-well performance. In
fact, there’s so much potential value buried in this darkness that I flip the
frame and refer to it as “bright data” that we have yet to tap.
45. GREG POWERS, VICE PRESIDENT OF TECHNOLOGY,
HALLIBURTON
• In the next phase of Halliburton’s ongoing analytics program, we want to develop
the capacity to capture, mine, and use bright data insights to become more
predictive.
• Given the nature of our operations, this will be no small task. Identical events
driven by common circumstances are rare in the oil and gas industry. We have 30
years of retrospective data, but there are an infinite number of combinations of
rock, gas, oil, and other variables that affect outcomes.
• Unfortunately, there is no overarching constituent physics equation that can
describe the right action to take for any situation encountered. Yet, even if we
can’t explain what we’ve seen historically, we can explore what has happened
and let our refined appreciation of historic data serve as a road map to where we
can go.
• In other words, we plan to correlate data to things that statistically seem to
matter and, then, use this data to develop a confidence threshold to inform how
we should approach these issues.
46. GREG POWERS, VICE PRESIDENT OF TECHNOLOGY,
HALLIBURTON
• We believe that nontraditional data holds the key to creating advanced intelligent
response capabilities to solve problems, potentially without human intervention, before
they happen.
• At the lowest level, we’ll take measurements and tell someone after the fact that
something happened. At the next level, our goal will be to recognize that something has
happened and, then, understand why it happened. The following step will use real-time
monitoring to provide in-the-moment awareness of what is taking place and why. In the
next tier, predictive tools will help us discern what’s likely to happen next. The most
extreme offering will involve automating the response—removing human intervention
from the equation entirely.
• Drilling is complicated work. To make it more autonomous and efficient, and to free
humans from mundane decision making, we need to work smarter. Our industry is facing
a looming generational change. Experienced employees will soon retire and take with
them decades of hard-won expertise and knowledge. We can’t just tell our new hires,
“Hey, go read 300 terabytes of dark data to get up to speed.” We’re going to have to rely
on new approaches for developing, managing, and sharing data-driven wisdom.
47. Where do you start?
Ask the right questions:
• Rather than attempting to discover and inventory all of the dark data
hidden within and outside your organization, work with business teams to
identify specific questions they want answered. Work to identify potential
dark analytics sources and the untapped opportunities contained therein.
• Then focus your analytics efforts on those data streams and sources that
are particularly relevant.
• For example, if marketing wants to boost sales of sports equipment in a
certain region, analytics teams can focus their efforts on real-time sales
transaction streams, inventory, and product pricing data at select stores
within the target region. They could then supplement this data with
historic unstructured data—in-store video analysis of customer foot traffic,
social sentiment, influencer behavior, or even pictures of displays or
product placement across sites—to generate more nuanced insights.
48. Look outside your organization:
• You can augment your own data with publicly available demographic,
location, and statistical information. Not only can this help your
analytics teams generate more expansive, detailed reports—it can put
insights in a more useful context.
• For example, a physician makes recommendations to an asthma
patient based on her known health history and a current examination.
By reviewing local weather data, he can also provide short-term
solutions to help her through a flare-up during pollen season. In
another example, employers might analyze data from geospatial
tools, traffic patterns, and employee turnover to determine the
extent to which employee job satisfaction levels are being adversely
impacted by commute times.
49. Augment data talent:
• Data scientists are an increasingly valuable resource, especially those who
can artfully combine deep modeling and statistical techniques with
industry or function-specific insights and creative problem framing. Going
forward, those with demonstrable expertise in a few areas will likely be in
demand.
• For example, both machine learning and deep learning require
programmatic expertise—the ability to build established patterns to
determine the appropriate combination of data corpus and method to
uncover reasonable, defensible insights. Likewise, visual and graphic design
skills may be increasingly critical given that visually communicating results
and explaining rationales are essential for broad organizational adoption.
• Finally, traditional skills such as master data management and data
architecture will be as valuable as ever—particularly as more companies
begin laying the foundations they’ll need to meet the diverse, expansive,
and exploding data needs of tomorrow.
50. Explore advanced visualization tools:
• Not everyone in your organization will be able to digest a printout of
advanced Bayesian statistics and apply them to business practices.
• Most people need to understand the “so what” and the “why” of complex
analytical insights before they can turn insight into action. In many
situations, information can be more easily digested when presented as an
infographic, a dashboard, or another type of visual representation.
• Visual and design software packages can do more than generate eye-
catching graphics such as bubble charts, word clouds, and heat maps—they
can boost business intelligence by repackaging big data into smaller, more
meaningful chunks, delivering value to users much faster. Additionally, the
insights (and the tools) can be made accessible across the enterprise,
beyond the IT department, and to business users at all levels, to create
more agile, cross-functional teams.
51. View it as a business-driven effort:
• It’s time to recognize analytics as an overall business strategy rather than
as an IT function. To that end, work with C-suite colleagues to garner
support for your dark analytics approach.
• Many CEOs are making data a cornerstone of overall business strategy,
which mandates more sophisticated techniques and accountability for
more deliberate handling of the underlying assets.
• By understanding your organization’s agenda and goals, you can determine
the value that must be delivered, define the questions that should be
asked, and decide how to harness available data to generate answers.
• Data analytics then becomes an insight-driven advantage in the
marketplace. The best way to help ensure buy-in is to first pilot a project
that will demonstrate the tangible ROI that can be realized by the
organization with a businesswide analytics strategy.
52. Think broadly:
• As you develop new capabilities and strategies, think about how you
can extend them across the organization as well as to customers,
vendors, and business partners. Your new data strategy becomes part
of your reference architecture that others can use.