The document proposes MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) as a framework for providing big data services. MAALBS follows a phased approach including requirements gathering, design, development, testing and deployment. It aims to help clients extract value from large, diverse datasets in a cost-effective manner while ensuring quality. The framework is intended to address challenges from big data's high volume, velocity and variety by standardizing processes around verification, validation and delivering business value. MphasiS believes MAALBS can help organizations accelerate innovation, lower costs and ensure high quality for big data projects.
What is Big Data?
Big Data Laws
Why Big Data?
Industries using Big Data
Current process/SW in SCM
Challenges in SCM industry
How Big data can solve the problems?
Migration to Big data for an SCM industry
Introduction
Big Data may well be the Next Big Thing in the IT world.
Big data burst upon the scene in the first decade of the 21st century.
The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Face book were built around big data from the beginning.
Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings.
This document provides an overview of big data, including its definition, size and growth, characteristics, analytics uses and challenges. It discusses operational vs analytical big data systems and technologies like NoSQL databases, Hadoop and MapReduce. Considerations for selecting big data technologies include whether they support online vs offline use cases, licensing models, community support, developer appeal, and enabling agility.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
IRJET- Big Data Management and Growth EnhancementIRJET Journal
1. The document discusses big data management and growth, including definitions of big data, properties of big data like volume, variety, and velocity, and applications of big data in various domains.
2. It describes how big data is used in education to improve student outcomes, in healthcare to enable prevention and more personalized care, and in industries like banking and fraud detection to enhance customer segmentation and risk assessment.
3. Big data analytics refers to analyzing large and complex datasets to extract useful insights and make better decisions. The document provides examples of machine learning and predictive analytics techniques used for big data analysis.
Global Business Intelligence (BI) software vendor, Yellowfin, and Actian Corporation, pioneers of the record-breaking analytical database Vectorwise, will host a series of Big Data and BI Best Practices Webinars.
These are the slides from that presentation.
The Big Data & BI Best Practices Webinars and associated slides examine the phenomenal growth in business data and outline strategies for effectively, efficiently and quickly harnessing and exploring ‘Big Data’ for competitive advantage.
70% of employees have access to data they should not…and that’s going to be a problem when GDPR takes affect in May 2018.
A strong data governance program ensures that you have the policies, standards, and controls in place to protect data effectively and access it for decision making. Data governance may become one of the most important functions of your data integration architecture when it comes to data agility.
Watch this on-demand webinar describing practical steps to data governance:
- Map personal data elements to data fields across systems using metadata
- Create workflows for data stewardship and manage end user computing
- Establish a data lake with native data quality for consent processing
- Track and manage data with audit trails and data lineage
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...emermell
This document summarizes a presentation on using data analytics for compliance, due diligence, and investigations. The presentation features four speakers: Raul Saccani of Deloitte, Dave Stewart of SAS Institute, John Walsh of SightSpan, and John Walsh of SAS Institute. It discusses challenges related to big data including volume, variety, and velocity of data. It provides examples of how financial institutions have used analytics for anti-money laundering model tuning and illicit network analysis. It also outlines the analytics lifecycle and considerations for adopting a proactive analytics strategy.
What is Big Data?
Big Data Laws
Why Big Data?
Industries using Big Data
Current process/SW in SCM
Challenges in SCM industry
How Big data can solve the problems?
Migration to Big data for an SCM industry
Introduction
Big Data may well be the Next Big Thing in the IT world.
Big data burst upon the scene in the first decade of the 21st century.
The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Face book were built around big data from the beginning.
Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings.
This document provides an overview of big data, including its definition, size and growth, characteristics, analytics uses and challenges. It discusses operational vs analytical big data systems and technologies like NoSQL databases, Hadoop and MapReduce. Considerations for selecting big data technologies include whether they support online vs offline use cases, licensing models, community support, developer appeal, and enabling agility.
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
It is an exciting and interesting time to be involved in data. More change of influence has occurred in the database management in the last 18 months than has occurred in the last 18 years. New technologies such as NoSQL & Hadoop and radical redesigns of existing technologies, like NewSQL , will change dramatically how we manage data moving forward.
These technologies bring with them possibilities both in terms of the scale of data retained but also in how this data can be utilized as an information asset. The ability to leverage Big Data to drive deep insights will become a key competitive advantage for many organisations in the future.
Join Tony Bain as he takes us through both the high level drivers for the changes in technology, how these are relevant to the enterprise and an overview of the possibilities a Big Data strategy can start to unlock.
IRJET- Big Data Management and Growth EnhancementIRJET Journal
1. The document discusses big data management and growth, including definitions of big data, properties of big data like volume, variety, and velocity, and applications of big data in various domains.
2. It describes how big data is used in education to improve student outcomes, in healthcare to enable prevention and more personalized care, and in industries like banking and fraud detection to enhance customer segmentation and risk assessment.
3. Big data analytics refers to analyzing large and complex datasets to extract useful insights and make better decisions. The document provides examples of machine learning and predictive analytics techniques used for big data analysis.
Global Business Intelligence (BI) software vendor, Yellowfin, and Actian Corporation, pioneers of the record-breaking analytical database Vectorwise, will host a series of Big Data and BI Best Practices Webinars.
These are the slides from that presentation.
The Big Data & BI Best Practices Webinars and associated slides examine the phenomenal growth in business data and outline strategies for effectively, efficiently and quickly harnessing and exploring ‘Big Data’ for competitive advantage.
70% of employees have access to data they should not…and that’s going to be a problem when GDPR takes affect in May 2018.
A strong data governance program ensures that you have the policies, standards, and controls in place to protect data effectively and access it for decision making. Data governance may become one of the most important functions of your data integration architecture when it comes to data agility.
Watch this on-demand webinar describing practical steps to data governance:
- Map personal data elements to data fields across systems using metadata
- Create workflows for data stewardship and manage end user computing
- Establish a data lake with native data quality for consent processing
- Track and manage data with audit trails and data lineage
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...emermell
This document summarizes a presentation on using data analytics for compliance, due diligence, and investigations. The presentation features four speakers: Raul Saccani of Deloitte, Dave Stewart of SAS Institute, John Walsh of SightSpan, and John Walsh of SAS Institute. It discusses challenges related to big data including volume, variety, and velocity of data. It provides examples of how financial institutions have used analytics for anti-money laundering model tuning and illicit network analysis. It also outlines the analytics lifecycle and considerations for adopting a proactive analytics strategy.
As the deadline for GDPR approaches, it is time to get practical about protecting personal data.
We break down the steps for turning a data lake into a data hub with appropriate data management and governance activities: from capturing and reconciling personal data to providing for consent management, data anonymyzation, and the rights of the data subject.
A smart approach to GDPR compliance lays a foundation for personalized and profitable customer and employee relations.
Join us, as experts from MAPR and Talend show you how to:
Diagnose the maturity of your GDPR compliance
Set up milestones and priorities to reach compliance
Create a foundation to manage personal data through a data lake
Master compliance operations - from data inventory to data transfers to individual rights management
This document discusses big data and provides an overview of key concepts and technologies. It defines big data as large volumes of data in various formats that are growing rapidly. It describes the four V's of big data - volume, velocity, variety, and value. The document then provides an overview of big data technologies like columnar databases, NoSQL, and Hadoop that are designed to handle large and complex data sets.
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBill Kohnen
The document discusses how big data is transforming purchasing and supply chain management. It notes that the volume, velocity, and variety of available data is immense and growing exponentially. By 2020, 40% of all data may come from internet-connected sensors. This data can provide new insights if analyzed properly using basic tools like Microsoft Office or specialized big data analytics platforms. The document recommends that purchasing professionals start small by analyzing internal supplier spend data to better understand spending patterns and identify areas for improvement and cost savings. If value is found, more advanced big data tools could be pursued. Overall, big data is leading to a transformation in purchasing where skills in categories, data, value creation, and system integration will be most important.
The document discusses big data, including its definition, types, benefits, and challenges. It describes how big data is generated from a variety of sources and is characterized by its volume, velocity, and variety (the 3Vs). Big data provides benefits like improved customer insights and business optimization. However, it also poses challenges to deal with its huge volume, high velocity, varied types (structured and unstructured), and issues of data veracity (uncertainty). Techniques to address these challenges include using distributed file systems, parallel processing frameworks like Hadoop, and data fusion or advanced mathematics to manage uncertainty.
This document discusses the big data analytics market opportunity. It notes that the volume of data from various sources is growing exponentially. It then outlines the life cycle of big data, reference architectures, and characteristics of big data. It discusses drivers of big data, pain points for enterprises, and the market opportunity for big data analytics. It predicts strong growth in spending on big data analytics and outlines types of analytics initiatives and trends in big data technology.
This document provides an overview of big data analytics, including:
1) It defines big data and describes its key characteristics of variety, velocity, and volume.
2) It outlines common types of big data like structured, unstructured, and semi-structured data.
3) It lists sources of big data such as social media, the cloud, the web, databases, and the Internet of Things.
4) It discusses challenges of big data like rapid growth, storage, security, and integrating diverse data sources.
Simplifying Big Data Analytics for the BusinessTeradata Aster
Tasso Argyros, Co-Founder & Co-President, Teradata Aster presents at the 2012 Big Analytics Roadshow.
The opportunity exists for organizations in every industry to unlock the power of iterative, big data analysis with new applications such as digital marketing optimization and social network analysis to improve their bottom line. Big data analysis is not just the ability to analyze large volumes of data, but the ability to analyze more varieties of data by performing more complex analysis than is possible with more traditional technologies. This session will demonstrate how to bring the science of data to the art of business by empowering more business users and analysts with operationalized insights that drive results. See how data science is making emerging analytic technologies more accessible to businesses while providing better manageability to enterprise architects across retail, financial services, and media companies.
The document discusses big data and machine learning techniques for fraud detection. It covers topics like big data ecosystems, Lambda architecture, real-time processing, machine learning algorithms like decision trees and neural networks, and challenges of fraud detection like processing billions of transactions in real-time. Fraud detection requires monitoring all transactions in real-time to detect unusual patterns and block compromised cards as quickly as possible to prevent fraud.
Презентация Виталия Никитина о возомжностях платформы HPE Idol для работы с BigData в современном кол-центре. Аналитика аудио и текстовой информации на базе платформы HPE IDOL
As we enter the digital economy, it becomes increasingly transparent that the information and data ecosphere will continue to be a complex environment for the foreseeable future, with information being provided from a variety of internal and external sources in the form of files, messages, queries and streams. It would be foolish for any organization to place their bets on any one platform to be their platform of choice because it is incongruent to the thought patterns of the consumers, suppliers, regulators, partners and financiers who will participate in their information ecosphere through data feeds, information requests and a host of other interfaces.
Rather, there is a role of each of these platforms which serve as the conduit for data and the transformation of data into information aligned with the value propositions of the organization. This writing is focused on the big data platform because there are some unique characteristics of the big data environment that require an approach different than many of the legacy environments that exist in organizations. Furthermore, while big data is the one environment that is new and requires these special handling characteristics, there will be future platforms with the same requirements as big data requires today, and hopefully lessons learned will be left to not revisit each of the challenges as the next transformational information ecosphere is made available.
Figure 1 The Fourth Industrial Revolution, World Economic Forum, InfoSight Partners, 2016
This time is different, in that information is the catalyst to achieving value and the platform ideally suited to house information not optimal for storage in the form of rows and columns is the big data environment. Understanding which information is delivered with intended consequences and having the management prowess to tune information shared with customers, prospects, suppliers, partners, regulators and financiers is critical for the digital economy. Additionally, it is specific to understand the challenges each platform housing information bring to the equation. This writing will focus on big data.
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
David Feinleib's Big Data Trends presentation from the World Future Society's Annual Conference, WorldFuture 2015, held at the Hilton Union Square, San Francisco, California July 25, 2015.
Jump start into 2013 by exploring how Big Data can transform your business. Listen to Infochimps Director of Product, Tim Gasper, cover the leading use cases for 2013, sharing where the data comes from, how the systems are architected and most importantly, how they drive business insights for data-driven decisions.
StreamAnalytix is a software platform that enables enterprises to analyze and respond to events in real-time at Big Data scale. It is designed to rapidly build and deploy streaming analytics applications for any industry vertical, any data format, and any use case.
Bigdata analysis in supply chain managmentKushal Shah
big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
supply chain industry need this type of data to survive in every situations.
Big data analytics use cases: all you need to knowJane Brewer
In order to take the next big leap in terms of technological advancement, we need data. Next-generation emerging technologies and inventions have piggybacked on top of big data, achieving maximum success. Here are Amazing Big Data Use Cases You Must Know!
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...Jean-Michel Franco
The document discusses how to enact data subject access rights under the General Data Protection Regulation (GDPR) using data services and data management. It notes that the top three challenges for GDPR compliance are consent management, the right to be forgotten, and data portability. It then presents a use case of how a company called ACME can personalize customer experience in a GDPR-compliant way by creating a GDPR data hub to find customer opt-in data, propagate that data across systems, and deliver data subject access rights like access, erasure, and portability through a customer portal. The document argues this approach can help companies achieve GDPR compliance while gaining business, IT, and risk benefits.
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...Databricks
The FDA Office of Regulatory Affairs (ORA) manages the process whereby all products imported into United States are screened by electronic systems and human inspections, https://www.fda.gov/ForIndustry/ImportProgram/.
About 40 million products are monitored annually resulting in 6 billion data records that need to be processed every night. Booz Allen built an Apache Spark system to analyze the FDA ORA data and to predict violations. The solution uses enterprise friendly SQL framework to expand from data aggregation to Machine Learning without heavy coding.
The system enables any enterprise DBA or analyst easily access, filter and transform data to apply the latest machine learning models. These analysts are able to process 6 billion records from various databases and other sources every night without any prior experience with Apache Spark. This helped to scale the Apache Spark solution enable data warehouse/RDBM experts to process powerful analytics workloads without needing to know Scala or Python.
Big data refers to the vast amount of structured and unstructured data that inundates organizations on a daily basis. This data comes from various sources such as social media, sensors, digital transactions, mobile devices, and more.
What is big data?
Big data is a mix of structured, semi-structured, and unstructured data gathered by organizations that can be dug for data and used in machine learning projects, predictive modeling, and other advanced analytics applications.
Systems that process and store big data have turned into a typical part of data the board architectures in organizations, joined with tools that support big data analytics uses. Big data is regularly portrayed by the three V's:
the enormous volume of data in numerous environments; • the wide variety of data types regularly stored in big data systems, and
the velocity at which a significant part of the data is created, gathered and processed.
These characteristics were first recognized in 2001 by Doug Laney, then, at that point, an analyst at consulting firm Meta Group Inc.; Gartner further promoted them after it gained Meta Group in 2005. All the more as of late, several other V's have been added to various descriptions of big data, including veracity, value and variability.
Albeit big data doesn't liken to a specific volume of data, big data deployments frequently involve terabytes, petabytes, and even exabytes of data made and gathered over time.
As the deadline for GDPR approaches, it is time to get practical about protecting personal data.
We break down the steps for turning a data lake into a data hub with appropriate data management and governance activities: from capturing and reconciling personal data to providing for consent management, data anonymyzation, and the rights of the data subject.
A smart approach to GDPR compliance lays a foundation for personalized and profitable customer and employee relations.
Join us, as experts from MAPR and Talend show you how to:
Diagnose the maturity of your GDPR compliance
Set up milestones and priorities to reach compliance
Create a foundation to manage personal data through a data lake
Master compliance operations - from data inventory to data transfers to individual rights management
This document discusses big data and provides an overview of key concepts and technologies. It defines big data as large volumes of data in various formats that are growing rapidly. It describes the four V's of big data - volume, velocity, variety, and value. The document then provides an overview of big data technologies like columnar databases, NoSQL, and Hadoop that are designed to handle large and complex data sets.
Big Data Impact on Purchasing and SCM - PASIA World Conference DiscussionBill Kohnen
The document discusses how big data is transforming purchasing and supply chain management. It notes that the volume, velocity, and variety of available data is immense and growing exponentially. By 2020, 40% of all data may come from internet-connected sensors. This data can provide new insights if analyzed properly using basic tools like Microsoft Office or specialized big data analytics platforms. The document recommends that purchasing professionals start small by analyzing internal supplier spend data to better understand spending patterns and identify areas for improvement and cost savings. If value is found, more advanced big data tools could be pursued. Overall, big data is leading to a transformation in purchasing where skills in categories, data, value creation, and system integration will be most important.
The document discusses big data, including its definition, types, benefits, and challenges. It describes how big data is generated from a variety of sources and is characterized by its volume, velocity, and variety (the 3Vs). Big data provides benefits like improved customer insights and business optimization. However, it also poses challenges to deal with its huge volume, high velocity, varied types (structured and unstructured), and issues of data veracity (uncertainty). Techniques to address these challenges include using distributed file systems, parallel processing frameworks like Hadoop, and data fusion or advanced mathematics to manage uncertainty.
This document discusses the big data analytics market opportunity. It notes that the volume of data from various sources is growing exponentially. It then outlines the life cycle of big data, reference architectures, and characteristics of big data. It discusses drivers of big data, pain points for enterprises, and the market opportunity for big data analytics. It predicts strong growth in spending on big data analytics and outlines types of analytics initiatives and trends in big data technology.
This document provides an overview of big data analytics, including:
1) It defines big data and describes its key characteristics of variety, velocity, and volume.
2) It outlines common types of big data like structured, unstructured, and semi-structured data.
3) It lists sources of big data such as social media, the cloud, the web, databases, and the Internet of Things.
4) It discusses challenges of big data like rapid growth, storage, security, and integrating diverse data sources.
Simplifying Big Data Analytics for the BusinessTeradata Aster
Tasso Argyros, Co-Founder & Co-President, Teradata Aster presents at the 2012 Big Analytics Roadshow.
The opportunity exists for organizations in every industry to unlock the power of iterative, big data analysis with new applications such as digital marketing optimization and social network analysis to improve their bottom line. Big data analysis is not just the ability to analyze large volumes of data, but the ability to analyze more varieties of data by performing more complex analysis than is possible with more traditional technologies. This session will demonstrate how to bring the science of data to the art of business by empowering more business users and analysts with operationalized insights that drive results. See how data science is making emerging analytic technologies more accessible to businesses while providing better manageability to enterprise architects across retail, financial services, and media companies.
The document discusses big data and machine learning techniques for fraud detection. It covers topics like big data ecosystems, Lambda architecture, real-time processing, machine learning algorithms like decision trees and neural networks, and challenges of fraud detection like processing billions of transactions in real-time. Fraud detection requires monitoring all transactions in real-time to detect unusual patterns and block compromised cards as quickly as possible to prevent fraud.
Презентация Виталия Никитина о возомжностях платформы HPE Idol для работы с BigData в современном кол-центре. Аналитика аудио и текстовой информации на базе платформы HPE IDOL
As we enter the digital economy, it becomes increasingly transparent that the information and data ecosphere will continue to be a complex environment for the foreseeable future, with information being provided from a variety of internal and external sources in the form of files, messages, queries and streams. It would be foolish for any organization to place their bets on any one platform to be their platform of choice because it is incongruent to the thought patterns of the consumers, suppliers, regulators, partners and financiers who will participate in their information ecosphere through data feeds, information requests and a host of other interfaces.
Rather, there is a role of each of these platforms which serve as the conduit for data and the transformation of data into information aligned with the value propositions of the organization. This writing is focused on the big data platform because there are some unique characteristics of the big data environment that require an approach different than many of the legacy environments that exist in organizations. Furthermore, while big data is the one environment that is new and requires these special handling characteristics, there will be future platforms with the same requirements as big data requires today, and hopefully lessons learned will be left to not revisit each of the challenges as the next transformational information ecosphere is made available.
Figure 1 The Fourth Industrial Revolution, World Economic Forum, InfoSight Partners, 2016
This time is different, in that information is the catalyst to achieving value and the platform ideally suited to house information not optimal for storage in the form of rows and columns is the big data environment. Understanding which information is delivered with intended consequences and having the management prowess to tune information shared with customers, prospects, suppliers, partners, regulators and financiers is critical for the digital economy. Additionally, it is specific to understand the challenges each platform housing information bring to the equation. This writing will focus on big data.
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
David Feinleib's Big Data Trends presentation from the World Future Society's Annual Conference, WorldFuture 2015, held at the Hilton Union Square, San Francisco, California July 25, 2015.
Jump start into 2013 by exploring how Big Data can transform your business. Listen to Infochimps Director of Product, Tim Gasper, cover the leading use cases for 2013, sharing where the data comes from, how the systems are architected and most importantly, how they drive business insights for data-driven decisions.
StreamAnalytix is a software platform that enables enterprises to analyze and respond to events in real-time at Big Data scale. It is designed to rapidly build and deploy streaming analytics applications for any industry vertical, any data format, and any use case.
Bigdata analysis in supply chain managmentKushal Shah
big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
supply chain industry need this type of data to survive in every situations.
Big data analytics use cases: all you need to knowJane Brewer
In order to take the next big leap in terms of technological advancement, we need data. Next-generation emerging technologies and inventions have piggybacked on top of big data, achieving maximum success. Here are Amazing Big Data Use Cases You Must Know!
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...Jean-Michel Franco
The document discusses how to enact data subject access rights under the General Data Protection Regulation (GDPR) using data services and data management. It notes that the top three challenges for GDPR compliance are consent management, the right to be forgotten, and data portability. It then presents a use case of how a company called ACME can personalize customer experience in a GDPR-compliant way by creating a GDPR data hub to find customer opt-in data, propagate that data across systems, and deliver data subject access rights like access, erasure, and portability through a customer portal. The document argues this approach can help companies achieve GDPR compliance while gaining business, IT, and risk benefits.
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...Databricks
The FDA Office of Regulatory Affairs (ORA) manages the process whereby all products imported into United States are screened by electronic systems and human inspections, https://www.fda.gov/ForIndustry/ImportProgram/.
About 40 million products are monitored annually resulting in 6 billion data records that need to be processed every night. Booz Allen built an Apache Spark system to analyze the FDA ORA data and to predict violations. The solution uses enterprise friendly SQL framework to expand from data aggregation to Machine Learning without heavy coding.
The system enables any enterprise DBA or analyst easily access, filter and transform data to apply the latest machine learning models. These analysts are able to process 6 billion records from various databases and other sources every night without any prior experience with Apache Spark. This helped to scale the Apache Spark solution enable data warehouse/RDBM experts to process powerful analytics workloads without needing to know Scala or Python.
Big data refers to the vast amount of structured and unstructured data that inundates organizations on a daily basis. This data comes from various sources such as social media, sensors, digital transactions, mobile devices, and more.
What is big data?
Big data is a mix of structured, semi-structured, and unstructured data gathered by organizations that can be dug for data and used in machine learning projects, predictive modeling, and other advanced analytics applications.
Systems that process and store big data have turned into a typical part of data the board architectures in organizations, joined with tools that support big data analytics uses. Big data is regularly portrayed by the three V's:
the enormous volume of data in numerous environments; • the wide variety of data types regularly stored in big data systems, and
the velocity at which a significant part of the data is created, gathered and processed.
These characteristics were first recognized in 2001 by Doug Laney, then, at that point, an analyst at consulting firm Meta Group Inc.; Gartner further promoted them after it gained Meta Group in 2005. All the more as of late, several other V's have been added to various descriptions of big data, including veracity, value and variability.
Albeit big data doesn't liken to a specific volume of data, big data deployments frequently involve terabytes, petabytes, and even exabytes of data made and gathered over time.
This document provides a summary of a group project report on big data analytics. It discusses how big data and analytics can help companies optimize supply chains by improving decision making and handling risks. It defines big data as large, diverse, and rapidly growing datasets that are difficult to manage with traditional tools. It also discusses data sources, management, quality dimensions, and using statistical process control methods to monitor and control data quality throughout the production process.
Big data is a mix of structured, semistructured, and unstructured data gathered by organizations that can be dug for data and used in machine learning projects,
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Taniya Fansupkar
This document provides an overview of big data, including its definition, origins, characteristics, importance, and opportunities and challenges. It describes big data as large volumes of diverse data that require new technologies and techniques to capture, curate, manage and process within a tolerable time. Big data is characterized by its volume, velocity and variety. Analyzing big data can provide benefits such as cost reductions, time reductions, new product development and smart decision making. It also discusses storing, processing and analyzing data at the edge of networks.
Big data is high-volume, high-velocity, and high-variety data that is difficult to process using traditional data management tools. It is characterized by 3Vs: volume of data is growing exponentially, velocity as data streams in real-time, and variety as data comes from many different sources and formats. The document discusses big data analytics techniques to gain insights from large and complex datasets and provides examples of big data sources and applications.
Introduction to big data – convergences.saranya270513
Big data is high-volume, high-velocity, and high-variety data that is too large for traditional databases to handle. The volume of data is growing exponentially due to more data sources like social media, sensors, and customer transactions. Data now streams in continuously in real-time rather than in batches. Data also comes in more varieties of structured and unstructured formats. Companies use big data to gain deeper insights into customers and optimize business processes like supply chains through predictive analytics.
The document discusses big data, including the different units used to measure data size like bytes, kilobytes, megabytes, etc. It notes that big data is difficult to store and process using traditional tools due to its large size and complexity. Big data is growing rapidly in volume, velocity and variety. Some challenges in analyzing big data include its unstructured nature, size that exceeds capabilities of conventional tools, and need for real-time insights. Security, access control, data classification and performance impacts must be considered when protecting big data.
This document provides an overview of big data. It begins with an introduction that defines big data as massive, complex data sets from various sources that are growing rapidly in volume and variety. It then discusses the brief history of big data and provides definitions, describing big data as data that is too large and complex for traditional data management tools. The document outlines key aspects of big data including the sources, types, applications, and characteristics. It discusses how big data is used in business intelligence to help companies make better decisions. Finally, it describes the key aspects a big data platform must address such as handling different data types, large volumes, and analytics.
Big data analytics involves capturing, storing, processing, analyzing, and visualizing huge quantities of information from a variety of sources. This data is characterized by its volume, variety, velocity, veracity, variability, and complexity. Traditional analytics are not suited to handle big data due to its size and constantly changing nature. By analyzing patterns in big data, businesses can gain insights to improve processes and campaigns. However, specialized software is needed to make sense of big data's different types and formats from numerous sources. The right big data solution depends on an organization's specific data, budgets, skills, and future needs.
The document discusses Luminar, an analytics company that uses big data and Hadoop to provide insights about Latino consumers in the US. Luminar collects data from over 2,000 sources and uses that data along with "cultural filters" to identify Latinos and understand their purchasing behaviors. This provides more accurate information than traditional surveys. Luminar implemented a Hadoop system to more quickly analyze this large amount of data and provide valuable insights to marketers and businesses.
Big data refers to large data sets that are too large or complex for traditional data processing systems. It is characterized by high volume, velocity, and variety. Common challenges with big data include analysis, storage, search, sharing, transfer, and privacy. Traditional systems are inadequate for big data, which requires massively parallel software running on many servers. Architectures for handling big data include distributed file systems, MapReduce frameworks like Hadoop, and data lake systems.
Enterprises are facing exponentially increasing amounts of data that is breaking down traditional storage architectures. NetApp addresses this "big data challenge" through their "Big Data ABCs" approach - focusing on analytics, bandwidth, and content. This enables customers to gain insights from massive datasets, move data quickly for high-speed applications, and securely store unlimited amounts of content for long periods without increasing complexity. NetApp's solutions provide a foundation for enterprises to innovate with data and drive business value.
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
This document provides an overview of big data analytics. It discusses the characteristics of big data, known as the 5 V's: volume, velocity, variety, veracity, and value. It describes how Hadoop has become the standard for storing and processing large datasets across clusters of servers. The challenges of big data are also summarized, such as dealing with the speed, scale, and inconsistencies of data from a variety of structured and unstructured sources.
This document provides an overview of big data, including definitions of key terms like data, big data, and examples of big data. It describes why big data is important, how big data analytics works, and the benefits it provides. It outlines different types of big data like structured, unstructured, and semi-structured data. It also discusses characteristics of big data like volume, velocity, variety, and veracity. Additionally, it identifies primary sources of big data and examples of big data tools and software. Finally, it briefly discusses how big data and machine learning are related and how AI can be used to enhance big data analytics.
This document provides an analysis of big data, including its characteristics, applications, and analytics techniques used by businesses. It discusses that big data is data that is too large to be processed by traditional databases and software. It has characteristics of volume, velocity, variety, and veracity. The document outlines tools for big data like Hadoop, MongoDB, Apache Spark, and Apache Cassandra. It explains that big data analytics helps businesses gain insights from vast amounts of structured and unstructured data to improve decision making.
The document discusses Big Data architectures and Oracle's solutions for Big Data. It provides an overview of key components of Big Data architectures, including data ingestion, distributed file systems, data management capabilities, and Oracle's unified reference architecture. It describes techniques for operational intelligence, exploration and discovery, and performance management in Big Data solutions.
Big data refers to extremely large data sets that are too large to be processed using traditional data processing applications. It is characterized by high volume, variety, and velocity. Examples of big data sources include social media, jet engines, stock exchanges, and more. Big data can be structured, unstructured, or semi-structured. Key characteristics include volume, variety, velocity, and variability. Analyzing big data can provide benefits like improved customer service, better operational efficiency, and more informed decision making for organizations in various industries.
This document discusses big data, defining it as large volumes of structured, semi-structured, and unstructured data that can be mined for information. It outlines four key characteristics of big data: volume, variety, velocity, and variability. It also discusses big data applications across various industries and provides examples of real-time big data applications. Finally, it covers challenges of conventional data systems and risks associated with big data projects.
Similar to Mphasi s agil_analytics_life_cycle_business_style_for_big_data_services[1] (20)
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Mphasi s agil_analytics_life_cycle_business_style_for_big_data_services[1]
1. A White Paper
by MS Balaje Viswanaathan
Big Data and BIDW Practitioner, Analytics
MphasiS
balajeviswanaathan.m@mphasis.com
MphasiS AGIL
Analytics Life
Cycle Business
Style (MAALBS) for
Big Data Services
2. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 2
3. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 3
Contents
Executive Summary....................................................4
What is “Big Data”......................................................4
Business and Process Drivers for Big Data...............5
MAALBS for BAAS
A Road Map for Big Data Services....................................7
MAALBS Style for Big Data Services.........................8
Manifesto of MAALBS................................................8
Principles of MAALBS................................................9
MphasiS Mind Maps on Big Data Projects................9
Phases of MAALBS .................................................10
High Level Architecture of MAALBS
Big Data Process..........................................................15
MAALBS for Operational challenges.................................. 15
MAALBS LEAN Adoption.........................................16
MphasiS MAALBS Big Data Team...........................17
Conclusion...............................................................17
4. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 4
Executive Summary
Today, the volume and complexity of market data required by diverse
industries such as BFSI, Retail, Healthcare, Communication Media,
Energy Utilities etc. have become immense and is growing at a rapid
phase. Ongoing market changes have accelerated the demand for
larger volumes of data, thus forcing industries to address this so-called
challenge “Big Data”. This demand is fueled as firms develop and deploy
new, sophisticated strategies. At the same time regulatory changes are
also forcing firms to source and report increasingly larger volumes of trade
data, as well as to adopt higher quality – and usually data-hungry – risk
and pricing models.
Social network data is also adding to this superabundance of data.
The micro-blogging site Twitter serves more than 200 million users
who produce more than 90 million “tweets” per day i.e. 800 per second.
Each of these posts is approximately 200 bytes in size. On an average,
this traffic equals more than 12 gigabytes, a day and, throughout the Twitter
ecosystem, the company produces a total of 8 terabytes of data per day.
Facebook announced they had surpassed the 750 million active-user
mark, making the social networking site the largest consumer-driven data
source in the world. Facebook users spend more than 700 billion minutes
per month on the service, and the average user creates 90 pieces of
content every 30 days. Each month, the community creates more than 30
billion pieces of content ranging from web links, news, stories, blog posts
and notes, to videos and photos.
Everywhere you look, the quantity of information in the world is soaring.
The term “Big Data” has emerged to describe this monstrous growth in data.
“Big Data” represents data sets whose characteristics are comprised of high
volume, high velocity, and a variety of data structures.
What is “Big Data”
“Big Data technologies describe a new generation of technologies
and architectures designed to economically extract value from very large
volumes of wide variety of Data, by enabling high-velocity capture, discovery
and / or analysis.”
“Extremely scalable analytics – analyzing petabytes of structured
and unstructured data at high velocity.”
“Big Data is data that exceeds the processing capacity of conventional
database systems.”
“Big Data is a technology that helps extract value from digital universe.”
Technology vendors in the fields of Legacy Database or Data Warehouse
say “Big Data” simply refers to a traditional data warehousing scenario
involving volumes of data that are available either in single or multi-terabyte
range. Others disagree: that “Big Data” is not limited to traditional Data
Warehouse situations, but includes real-time or operational data stores
used as the primary data foundation for online applications that power key
external or internal business systems. It used to be that these transactional/
real-time databases were typically “pruned” so they could be manageable
from a data volume standpoint. Their most recent or “hot” data stayed in
the database, and older information was archived to a Data Warehouse via
extract-transform-load (ETL) routines.
5. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 5
Business and Process Drivers
for Big Data
Business Drivers
Volume
Potential of terabytes to petabytes of data.
Data volume is the primary attribute of “Big Data.” Volume is often quantified
in terms of terabytes of data. Anything between 3 to 10 terabytes of data
falls within the realm of “Big Data”. In addition, data volume can also be
quantified by counting records, transactions, tables, and files. A large
number of records, transactions, tables, or files can be categorized as
“Big Data”. Though the volume of data is one of the defining characteristics
of “Big Data”, data velocity and data variety (highlighted below) constitute
the other key characteristics/ingredients of “Big Data”.
Variety
All types of data are now being captured such as structured, semi-structured,
unstructured, streaming data, video, audio, Radio Frequency Distribution and
Sensors (RFID) etc.
A significant factor that makes “Big Data” considerably immense is that
it is coming from a greater variety of sources than ever before. Data from
web sources (i.e., web logs, clickstreams) and social media is remarkably
diverse. RFID data from supply chain applications, text data from call center
applications, semi-structured data from various business-to-business
processes, and geospatial data in logistics make up an eclectic mix of data
types. Variety and diversity have therefore become an important attribute
characterizing “Big Data”.
Velocity
How fast does the data come in? Speed or velocity of data is another
defining characteristic of “Big Data”. Data velocity encompasses the
frequency of data generation and the frequency of data delivery. In today’s
hyper-connected and networked society, there is a continuous stream
of information coming from a range of devices ranging from sensors and
robotics manufacturing machines, to video cameras and mobile gadgets.
This ever-increasing amount of data relentlessly flying from devices
in real-time is causing data volumes to grow and do so in a hurry.
Big Data Drivers
Verification
Validation
Value
Process DriversBusiness Drivers
Volume
Variety
Velocity
6. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 6
Process Drivers
Verification
It is a process where data is checked for any inaccurate and inconsistent
information after migration. It helps to determine whether
(i) data is accurately translated while it is transported from one source
to another,
(ii) is complete, and
(iii) supports processes in the new system.
During data verification, there may be a need for a parallel processing of
both systems to identify areas of disparity and forestall erroneous data loss.
Validation
It is a process of ensuring that a program operates on clean, correct and
useful data. It uses routines, often called “validation rules” or “check routines”,
that check for correctness, meaningfulness, and security of data that are input
to the system. The rules may be implemented through the automated facilities
of a data dictionary, or by the inclusion of explicit application program
validation logic.
For business applications, data validation can be defined through declarative
data integrity rules, or procedure-based business rules. Data that does not
conform to these rules will negatively affect business process execution.
Therefore, data validation should start with business-process definition and
set of business rules within this process. Rules can be collected through the
requirements capture exercise. The simplest data validation verifies that the
characters provided come from a valid set. For example, telephone numbers
should include the digits and possibly the characters +, –, and () (plus, minus,
and brackets). A more sophisticated data validation routine would check to
see whether a user had entered a valid country code, i.e., that the number
of digits entered matched the convention for the country or area specified.
Incorrect data validation can lead to data corruption or a security vulnerability.
Data validation checks that data is valid, sensible, reasonable, and secure
before they are processed.
A validation process involves two distinct steps:
(a) Validation Check and
(b) Post-Check Action. The check step uses one or more computational
rules (see section below) to determine if the data is valid.
The post-validation action sends feedback to help enforce validation.
Value
With all the Volume, Variety and Velocity existing in the business, processing
of Big Data helps in deriving value and insight from it to be able to tie it with
business plan that can drive business outcome, ROI and profitability.
7. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 7
MAALBS for BAAS – A Road Map
for Big Data Services
MAALBS for Big Data Services
Every IT organization wants to accelerate innovation, lower costs,
and ensure the high quality of its services. Yet, each of these goals
presents challenges.
Companies need to discover and evaluate the implications that business
innovations may have on their system landscapes — and IT must work
to minimize any system downtime these innovations may confront.
Companies have to ensure ongoing quality in terms of functionality,
performance, availability, and security as the business is dependent
on all these parameters.
The system development and support process is complicated and complex.
Therefore, maximum flexibility and appropriate control is required. Evolution
favors those who operate with maximum exposure to environmental
change and are optimized for flexible adaptation to change. Evolution
deselects those who have insulated themselves from environmental change
and have minimized chaos and complexity in their environment.
The term “Big Data” has become a buzz in both the business and the
technology world. There are numerous conferences, seminars, webinars
and forums on the topic of Big Data and Cloud Computing and the subject
seems like an overused word today. There is still some ambiguity about what
comprises Big Data – Is it just the sheer volume or is it mix of volume, variety,
velocity regardless of the size of data or is it the voluminous unstructured
data coming from social media and machine logs?
Now the scope of the Big Data drivers has expanded from three dimensions
to six dimensions such as volume, velocity, variety to verification, validation
and value. The first three Vs fall into features of Big Data while the last three
Vs come under process and business outcomes.
Therequiredapproachshouldenabledevelopmentteamstooperateadaptively
within a complex environment using imprecise processes. Complex system
development occurs under rapidly changing circumstances. To overcome
these challenges, MphasiS an HP company’s Analytics team has come up with
a robust methodology for their Big Data Roadmap called MAALBS process
which tailors the combination of AGILE SCRUM, ITIL framework and Lean
to efficiently manage and support the entire life cycle of their applications
right from Discover, Design, Develop, Deploy Support. (4DS) = (4A) (Acquire,
Analyze, Assemble and Act).
Discover
Design
Develop
Deploy
Support
Acquire
Analyze
Assemble
Act
MAALBS = BAAS
8. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 8
MAALBS for Big Data Services
Envision and Explore
Process
Institutionalization
Defect
Prevention
MAALBS is an Hybrid AGILE framework which blends SCRUM + ITIL + LEAN
ContinuousServiceImprovement
KnowledgeManagement
Operational Management
Product
Owner
Product
Backlog
Item 1
Item 2
Item 3
Item 4
Item 5
Sprint Backlog
Item 1
Item 2
Item 3
Release Backlog
Item 1
Item 2
Item 3
ACT=DEPLOYMENT
Phase Gate
IN – Approved
BRD
Phase Gate
OUT – Approved
TDD
Phase Gate
IN – Report in
running condition
Phase Gate
Out – Demo
Phase Gate
IN – Approved
TDD
Phase Gate
OUT – Developed
Report
Phase Gate
IN – UTC
Phase Gate
OUT – UTR
Sprint
Meeting with
PO/SM/Team
Retrospective
R D phase and
Promoting the product to
Support Team to include
in GO LIVE schedule List
NO
Business
Acceptance
REVIEW
TESTINGASSEMBLE=DEVELOP
3-4 Week Sprint
YES
1
4 5
6
7
X
X
ANALYZE = DESIGN
3
ACQUIRE = DISCOVER
2
Envision
Speculate
Explore
Adapt
Close
Manifesto of MAALBS
We are uncovering better ways of developing software by doing it
and helping others do it. Through this work we have come to value:
• Team collaboration over processes and tools
• Quality deliverable inline with intelligence over comprehensive
documentation
• Stakeholder’s collaboration over contract negotiation
• Rooms for innovations and welcoming changes over following a plan
At a higher level, MAALBS tailors and adapts the AGILE Project
Management framework introduced by the expert Jim High Smith.
The framework is as follows:
• Envision: Determine the product vision and project scope, the project
community, and how the team will work together
• Speculate: Develop a feature-based release, milestone, and iteration
plan to deliver on the vision
• Explore: Deliver tested features in a short timeframe, constantly
seeking to reduce the risk and uncertainty of the project
• Adapt: Review the delivered results, the current situation, and the
team’s performance, and adapt as necessary
• Close: Conclude the project, pass along key learnings, and celebrate
9. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 9
Principles of MAALBS
• Our highest priority is to satisfy the customer through early and continuous
delivery of valuable software
• Welcome changing requirements, even late in development
• Providing rooms for innovation across project, process and technology
• Deliver working software frequently, from three weeks to six weeks,
with a preference to the shorter timescale
• Business people and developers must work together daily throughout
the project
• Build projects around motivated individuals. Give them the environment
and support they need, and trust them to get the job done
• Face-to-face conversation is the most efficient and effective method
of conveying information to and within a development team
• Working software is the primary measure of progress
• MAALBS processes promote sustainable development. The sponsors,
developers, and users should be able to maintain a constant pace indefinitely
• Continuous attention to technical excellence and good design enhances agility
• Simplicity – the art of maximizing the amount of work not done – is essential
• The best architectures, requirements, and designs emerge from self-organizing
teams
• At regular intervals, the team reflects on how to become more effective, then
tunes and adjusts its behavior accordingly
MphasiS Mind Maps on
Big Data Projects
1. Identifying the line
of Business10. Embed statistics
Analytics for effective
Decision Making
Visualization
MAALBS Big Data Mind Map
2. Data Collection
5. Tailoring and
adhering the standards
and governance to ease
the skills
9. Augmenting Hadoop
with Enterprise
Data Warehouse
6. Collaborating with
COE participating in
the tech forums
3. Profiling the
Business Data
4. Adopting MAALBS —
AGILE Process
8. Align Cloud
Operating Model
7. San Box Prototype
and performance
MAALBS
Big Data
Mind Map
10. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 10
Phases of MAALBS
The phases of the MAALBS has been classified into technical and AGIL
process, a blend of AGILE SCRUM, ITIL and LEAN
Technical Process: ACQUIRE, ANALYZE, ASSEMBLE AND ACT
Acquire
• Variety of data are collected in the aspects of heterogeneity, scale,
timeliness, complexity, in all phases of the pipeline that can create
value from data.
• When the data tsunami requires us to make decisions, currently in an
ad-hoc manner, about what data to keep and what to discard, and
how to store what we keep reliably with the right metadata.
• The value of data explodes when it can be linked with other data,
thus data integration is a major creator of value. Since most data is
directly generated in digital format today, we have the opportunity and
the challenge both to influence the creation to facilitate later linkage
and to automatically link previously created data.
• Much data today is not available in structured format; for example, tweets
and blogs are weakly structured pieces of text, while images and video
are structured for storage and display, but not for semantic content
and search: transforming such content into a structured format
for later analysis is a major challenge.
• Big data does not arise out of a vacuum: it is recorded from some data-
generating source. For example, consider our ability to sense and observe the
world around us, from the heart rate of an elderly citizen, and presence of toxins
in the air we breathe, to the planned square kilometer array telescope, which
will produce up to 1 million terabytes of raw data per day. Similarly, scientific
experiments and simulations can easily produce petabytes of data today.
AGIL Process: AGILE SCRUM , ITIL and LEAN
Discover – Story Gathering
Phase Gate IN Process Phase Gate OUT
High level requirements
will be shared in brief
through Power Point,
Excel or Documents.
The product owner
will prioritize the
product from the
product backlog
and provide the details
to the MAALBS team.
Approved Business/
Functional requirement
Documents
Image
Video, Audio
E-mails
Feed Back Forms
Twitter
Facebook
Linkedin
My Space
RSS Feed
Structured
Semi-structured
Un-structured
SocialMedia
XML
DWH DB
RDBMS
ORDBMS
Master-Detail
Structured Files
Web Logs
Contact Logs
Device Logs
Click Stream
Machine Generated
Session Logs
11. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 11
In SPRINT planning
meeting the team
will analyze the
requirements like
business requirement
document, functional
requirement document
for all the prioritized
product and scope it for
the upcoming Sprint.
The products which
cannot be agreed to
complete in the sprint
will be directly pushed
back to the product
backlog with proper
justification provided
to the product owner
like requirements (BR/
FR) not signed-off, huge
estimates due to report
complexity, resource
capacity, etc.
Document
Analyze
Frequently, the information collected will not be in a format ready for
analysis. For example, consider the collection of electronic health records
in a hospital, comprising transcribed dictations from several physicians,
structured data from sensors and measurements (possibly with some
associated uncertainty), and image data such as x-rays. We cannot
leave the data in this form and still effectively analyze it.
Rather we require an information extraction process that pulls out the required
information from the underlying sources and express it in a structured form
suitable for analysis. Doing this correctly and completely is a continuing
technical challenge. Note that this data also includes images and will in the
future include video; such extraction is often highly application dependent
(e.g., what you want to pull out of an MRI is very different from what you would
pull out of a picture of the stars, or a surveillance photo).
In addition, due to the ubiquity of surveillance cameras and popularity of
GPS-enabled mobile phones, cameras, and other portable devices, rich
and high fidelity location and trajectory (i.e., movement in space) data
can also be extracted.
We are used to thinking of Big Data as always telling us the truth,
but this is actually far from reality.
For example, patients may choose to hide risky behavior and caregivers
may sometimes mis-diagnose a condition; patients may also inaccurately
recall the name of a drug or even that they ever took it, leading to missing
information in (the history portion of) their medical record. Existing work
on data cleaning assumes well-recognized constraints on valid data
or well-understood error models; for many emerging Big Data domains
these do not exist.
12. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 12
Design
Phase Gate IN Process Phase Gate
OUT
Approved
Business/
Functional
requirement
document
MAALBS team starts work toward the
initial design of the product/interface
if more than one approach has been
suggested to Design the interface,
all the approach options are properly
documented in Technical design
document (TDD) and the approach
which will be followed get it singed-off
in order to avoid the confusion
at later stage
Approved
Technical
design
document
Assemble
Given the heterogeneity of the flood of data, it is not enough merely to record
and throw it into a repository. Consider, for example, data from a range of
scientific experiments. If we just have a bunch of data sets in a repository, it
is unlikely anyone will ever be able to find, let alone reuse, any of this data.
With adequate metadata, there is some hope, but even so, challenges will
remain due to differences in experimental details and in data record structure.
Data analysis is considerably more challenging than simply locating, identifying,
understanding, and citing data. For effective large-scale analysis all of this has
to happen in a completely automated manner.
This requires differences in data structure and semantics to be expressed in
forms that are computer understandable, and then “robotically” resolvable.
There is a strong body of work in data integration that can provide some of
the answers. However, considerable additional work is required to achieve
automated error-free difference resolution.
Mining requires integrated, cleaned, trustworthy, and efficiently accessible
data, declarative query and mining interfaces, scalable mining algorithms,
and big-data computing environments. At the same time, data mining itself
can also be used to help improve the quality and trustworthiness of the data,
understand its semantics, and provide intelligent querying functions.
13. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 13
Development
Phase Gate IN Process Phase Gate OUT
Approved Technical
Design Document
(TDD)
The development activities
are sub-categorized into
multiple task/steps as per
the level of estimates (LOEs)
shared to the product owner
and each task/steps are
carried out sequentially like
Code development, Report
development, application/
interface development etc.
Each task/step will have to go
through verification, validation
and review before the start of
the next task.
Develop verify validate
review Develop
The product owner gets
a frequent update on the
progress of development
activities in “Daily breakfast
meeting” from the SCRUM
master. The status on every
day’s development activities
are discussed in “Daily SCRUM
meeting” among the team
members and Scrum master
Workable Product
Act
By studying how best to capture, store, and query provenance, in conjunction
with techniques to capture adequate metadata, we can create an infrastructure
to provide users with the ability both to interpret analytical results obtained and
to repeat the analysis with different assumptions, parameters, or data sets.
Systems with a rich palette of visualizations become important in conveying
to the users the results of the queries in a way that is best understood in
the particular domain. Whereas early business intelligence systems’ users
were content with tabular presentations, today’s analysts need to pack
and present results in powerful visualizations that assist interpretation,
and support user collaboration.
Furthermore, with a few clicks the user should be able to drill down into each
piece of data that the user sees and understand its provenance, which is a key
feature in understanding the data. That is, users need to be able to see not just
the results, but also understand why they are seeing those results.
However, raw provenance, particularly regarding the phases in the analytics
pipeline, is likely to be too technical for many users to grasp completely.
DASHBOARD TREND MOBILE BI
14. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 14
One alternative is to enable the users to “play” with the steps in the
analysis – make small changes to the pipeline, for example, or modify
values for some parameters. The users can then view the results of these
incremental changes.
By these means, users can develop an intuitive feeling for the analysis and
also verify that it performs as expected in corner cases. Accomplishing this
requires the system to provide convenient facilities for the user to specify
analyses. Declarative specification, is one component of such a system.
Testing
Phase Gate IN Process Phase Gate OUT
Workable Product MAALBS team takes the sole
responsibility of constructing
the test cases and test
plan in line with business
requirements. The product
is tested for each and every
functional clauses the expected
and actual results are captured.
Test Case Results
Deployment
Phase Gate IN Process Phase Gate OUT
Deployable
Document
The Deployment phase
bridges the gap between the
MAALBS Development team
and MAALBS support. The
MAALBS Development team
constructs the deployable
document pertaining to the
particular product/interface
and checks all the entries
in the deployable document
manually with respect to the
particular environment. The
Support uses the deployable
document shared by the
MAALBS and deploys to the
respective environment say
PRODUCTION.
Workable Product
15. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 15
High Level Architecture of MAALBS –
Big Data Process
MAALBS for Operational Challenges
MAALBS service operation related activities is carried out by the MAALBS
support team. The MAALBS support team is responsible for following
service operations:
Service Desk Function
• Serves as a First Point of contact
• Owns the logged request and ensures it is getting in line
with the user acceptance
• Does a first level fix and first level diagnosis
• Serves as liaison between the end user and IT services provision team
• Supports other IT provisions activities on need basis
• Escalates to the appropriate team when things go out of control
• Plays a vital role in achieving the customer satisfaction BIGSHEETS,
DATA
VISUVALIZATION
BUSINESS
INTELLIGENCE,
REPORTS
MAP
PROGRAM
AUDIO,
VIDEO
DOCS,
TXT
WEB,
LOGS
SOCIAL,
GRAPH
SENSORS,
DEVICES
SPATIAL,
GPS
EVENTS,
OTHERS
FLUME
LUCENE
SOLR
OTHERS
API’s
DB
CASSANDRA
VERTICA
HBASE
HIVE
MAHOUT
MONGODB
STORAGES
HDFS
16. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 16
Incident Management
• The MAALBS support team is responsible for restoring the service of the
application in line with the agreed SLA on Interrupted services
• The incident is acknowledged and the events of the incidents are recorded on
a timely basis in the Incident Management Tool used by the MAALBS team
• The MAALBS team tracks and updates the progress of the incident until
it gets closed in line with the user acceptance
• The MAALBS team executes a professional approach on identifying root
cause of the incident
• The MAALBS support team ensures that problems are identified and resolved
• The MAALBS support team eliminates the recurring incidents
• The MAALBS support team minimizes impact of incidents or problems that
cannot be prevented
• The MAALBS team employs a strategic approach to execute a permanent
fix or a work around
MAALBS Knowledge Management
• The MAALBS team adopts a professional approach by gathering,
analyzing, storing and sharing the knowledge throughout the MAALBS
Life Cycle approach
• The MAALBS support as well as MAALBS development team cross trains
themselves across process, project and technology to build a strong team
MAALBS LEAN Adoption
• Optimal usage of resources by eliminating the waste
• Amplify learning through retrospectives (create knowledge)
• Decide as late as possible (defer commitment)
• Deliver as fast as possible (deliver fast)
• Work collaboratively by empowering the teams (respect people)
• Deliver quality work products in line with the internal and external
stakeholders expectations
• See the whole (optimize the whole)
17. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 17
MphasiS MAALBS Big Data Team
MAALBS is a cross-functional team that adopts AGILE SCRUM, ITIL and
LEAN for their POCs and Projects which will be implemented in an iterative
and incremental way in SPRINTS, The team has got the hyper-specialization
skills in Business Intelligence, Analytics and Hadoop Ecosystems.
Conclusion
MAALBS – An AGILE approach has helped projects on a value-driven delivery
model and also accelerated BI/DW development in a cost-benefit manner with
increased quality of deliverables. MAALBS also augments the incremental
delivery through SPRINTS by emphasizing continuous, incremental, and
evolutionary growth-and-improvement.
I would like to express my appreciation and thanks to all my leaders who
encouraged me in articulating this framework and I would also like to thank
Ganesh Jegannathan, Sampath Kumar Sundaramurthy, Senthil Nathan and
Saravanan Mohan who have helped me a lot by sharing their thoughts.
18. A White Paper on MphasiS AGIL Analytics Life Cycle Business Style (MAALBS) for Big Data Services MphasiS 18
MS Balaje Viswanaathan
Big Data and BIDW Practitioner, Analytics
MphasiSAbout Author
MS Balaje Viswanaathan also known as “MS” has got 15 years of rich
cross cultural experience in the IT sector. He is currently engaged with
MphasiS – as a Delivery Group Manager – Business Intelligence and
Data Warehousing practices and is a Big Data practitioner. He has an
MCA degree from University of Madras and MBA in Systems and Project
Management. He is the author for LETZ DO PMP and LETZ DO ITILV3 [F]
which primarily focuses on Project Management Service Management
practices. He has got wide range of expertise in diversified fields of
Information Technology services which includes Data Warehousing and
Business Intelligence, Software Development, Maintenance and Testing,
Operations and Project Management. He has also implemented AGILE-
SCRUM Methodology in his recent BI assignment and has come up with BI
initiatives for Process Innovation Framework say MAALBS. MS has taken
training sessions on PMP, ITIL, AGILE SCRUM and DW ETL Informatica.
He is certified in the following disciplines:
• PMP from PMI, USA [PMI Member id: 728277]
• PRINCE2 [practitioner] from APMG UK
• AGILE SCRUM Master from SCRUM Alliance
• ITILV3 [F] from APMG UK
• Certified Six Sigma Green Belt
• Cloud Computing from EXIN
• IBM Mastery BIG Insights – IBM Big Data