The document discusses big data and how it is often misunderstood and overhyped. It argues that big data needs to be objectively defined in order to make meaningful claims and measurements regarding its success. Jumping into big data without proper preparation often leads to the same dismal results as failed IT projects. The document advocates separating the signal from the noise to truly understand big data.
Big Data & Analytics for Government - Case StudiesJohn Palfreyman
This presentation explains the future challenges that Governments face, and illustrates how Big Data & Analytics technologies can help address these challenges. Four case studies - based on recent customer projects - are used to show the value that the innovative application of these technologies can bring.
Analysis on big data concepts and applicationsIJARIIT
The term, Big Data ‘ h a s been referred as a large amount of data that cannot be handled by traditional database
systems. It consists of large volumes of data which is been generated at a very fast rate, these cannot be handled and processed by
traditional data management tools, so it requires a new set of tools or frameworks to handle these types of data. Big data
works under V’s namely Volume, Velocity, and Variety. Volume refers to the size of the data whereas Velocity refers to the
speed that the data is being generated. Variety refers to different formats of data that is generated. Mostly in today’s world
thee average volumes of unstructured data like audio, video, image, sensor data etc. One can get these types of data through
social media, enterprise data, and Transactional data. Through Big data analytics, one can able to examine large data sets
containing a variety of data types. Primary goals of big data analytics are to help the organizations to take important decisions
by appointing data scientists and other analytics professionals to analyses large volumes of data. Challenges one can face
during large volume of data, especially machine-generated data, is exploding, how fast that data is growing every year, with
new sources of data that are emerging. Through the article, the authors intend to decipher the notions in an intelligible
manner embodying in text several use-cases and illustrations
Convergence Partners has released its latest research report on big data and its meaning for Africa. The report argues that big data poses a threat to those it overlooks, namely a large percentage of Africa’s populace, who remain on big data’s periphery.
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...Rida Qayyum
The concept of Big Data become extensively popular for their vast usage in emerging technologies. Despite being complex and dynamic, big data environment has been generating the colossal amount of data which is impossible to handle from traditional data processing applications. Nowadays, the Internet of things (IoT) and social media platforms like, Facebook, Instagram, Twitter, WhatsApp, LinkedIn, and YouTube generating data in various formats. Therefore, this promotes a drastic need for technology to store and process this tremendous volume of data. This research outlines the fundamental literature required to understand the concept of big data including its nature, definitions, types, and characteristics. Additionally, the primary focus of the current study is to deal with two fundamental issues; storing an enormous amount of data and fast data processing. Leading to objectives, the paper presents Hadoop as a solution to address the problem and discussed the Hadoop Distributed File System (HDFS) and MapReduce programming framework for storage and processing in Big Data efficiently. Future research directions in this field determined based on opportunities and several emerging issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal solutions to address Big Data storage and processing problems. Moreover, this study contributes to the existing body of knowledge by comprehensively addressing the opportunities and emerging issues of Big Data.
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
We are in the middle of a data flood and we need to figure out how to tame it without drowning. Most of what has been written about Big Data is focused on selling hardware and services. But what about a Big Data Strategy that guides hardware and software decisions? While virtually every major organization is faced with the challenge of figuring out the approach for and the requirements of this new development, jumping into the fray hastily and unprepared will only reproduce the same dismal IT project results as previously experienced. Join Dr. Peter Aiken as he will debunk a number of misconceptions about Big Data as your un-typical IT project. He will provide guidance on how to establish realistic Big Data management plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers without getting lost in the hype.
Takeaways:
- The means by which Big Data techniques can complement existing data management practices
- The prototyping nature of practicing Big Data techniques
- The distinct ways in which utilizing Big Data can generate business value
- Bigger Data isn’t always Better Data
Big Data & Analytics for Government - Case StudiesJohn Palfreyman
This presentation explains the future challenges that Governments face, and illustrates how Big Data & Analytics technologies can help address these challenges. Four case studies - based on recent customer projects - are used to show the value that the innovative application of these technologies can bring.
Analysis on big data concepts and applicationsIJARIIT
The term, Big Data ‘ h a s been referred as a large amount of data that cannot be handled by traditional database
systems. It consists of large volumes of data which is been generated at a very fast rate, these cannot be handled and processed by
traditional data management tools, so it requires a new set of tools or frameworks to handle these types of data. Big data
works under V’s namely Volume, Velocity, and Variety. Volume refers to the size of the data whereas Velocity refers to the
speed that the data is being generated. Variety refers to different formats of data that is generated. Mostly in today’s world
thee average volumes of unstructured data like audio, video, image, sensor data etc. One can get these types of data through
social media, enterprise data, and Transactional data. Through Big data analytics, one can able to examine large data sets
containing a variety of data types. Primary goals of big data analytics are to help the organizations to take important decisions
by appointing data scientists and other analytics professionals to analyses large volumes of data. Challenges one can face
during large volume of data, especially machine-generated data, is exploding, how fast that data is growing every year, with
new sources of data that are emerging. Through the article, the authors intend to decipher the notions in an intelligible
manner embodying in text several use-cases and illustrations
Convergence Partners has released its latest research report on big data and its meaning for Africa. The report argues that big data poses a threat to those it overlooks, namely a large percentage of Africa’s populace, who remain on big data’s periphery.
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...Rida Qayyum
The concept of Big Data become extensively popular for their vast usage in emerging technologies. Despite being complex and dynamic, big data environment has been generating the colossal amount of data which is impossible to handle from traditional data processing applications. Nowadays, the Internet of things (IoT) and social media platforms like, Facebook, Instagram, Twitter, WhatsApp, LinkedIn, and YouTube generating data in various formats. Therefore, this promotes a drastic need for technology to store and process this tremendous volume of data. This research outlines the fundamental literature required to understand the concept of big data including its nature, definitions, types, and characteristics. Additionally, the primary focus of the current study is to deal with two fundamental issues; storing an enormous amount of data and fast data processing. Leading to objectives, the paper presents Hadoop as a solution to address the problem and discussed the Hadoop Distributed File System (HDFS) and MapReduce programming framework for storage and processing in Big Data efficiently. Future research directions in this field determined based on opportunities and several emerging issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal solutions to address Big Data storage and processing problems. Moreover, this study contributes to the existing body of knowledge by comprehensively addressing the opportunities and emerging issues of Big Data.
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
We are in the middle of a data flood and we need to figure out how to tame it without drowning. Most of what has been written about Big Data is focused on selling hardware and services. But what about a Big Data Strategy that guides hardware and software decisions? While virtually every major organization is faced with the challenge of figuring out the approach for and the requirements of this new development, jumping into the fray hastily and unprepared will only reproduce the same dismal IT project results as previously experienced. Join Dr. Peter Aiken as he will debunk a number of misconceptions about Big Data as your un-typical IT project. He will provide guidance on how to establish realistic Big Data management plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers without getting lost in the hype.
Takeaways:
- The means by which Big Data techniques can complement existing data management practices
- The prototyping nature of practicing Big Data techniques
- The distinct ways in which utilizing Big Data can generate business value
- Bigger Data isn’t always Better Data
Due to technological advances, vast data sets (e.g. big data) are increasing now days. Big Data a new term; is used
to identify the collected datasets. But due to their large size and complexity, we cannot manage with our current
methodologies or data mining software tools to extract those datasets. Such datasets provide us with unparalleled
opportunities for modelling and predicting of future with new challenges. So as an awareness of this and
weaknesses as well as the possibilities of these large data sets, are necessary to forecast the future. Today’s we
have an overwhelming growth of data in terms of volume, velocity and variety on web. Moreover this, from a
security and privacy views, both area have an unpredictable growth. So Big Data challenge is becoming one of the
most exciting opportunities for researchers in upcoming years.
Hence this paper discuss about this topic in a broad overview like; its current status; controversy; and challenges to
forecast the future. This paper defines at some of these problems, using illustrations with applications from various
areas. Finally this paper discuss secure management and privacy of big data as one of essential issues.
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
Rapid development and popularization of internet and technological advancement introduced massive amount
of data and still increasing continuously and daily. A very large amount of data generated, collected, stored, transferred by
applications such as sensors, smart mobile devices, cloud systems and social networks put us on the era of BIG data, a data
with huge size, complex and unstructured data types from many origins. So converting these BIG data into useful information
is essential, the technique for discovering hidden interesting patterns and knowledge insights into BIG data introduced
as BIG data mining. BIG data have rises so many problems and challenges related with handling, storing, managing,
transferring, analyzing and mining but it has provides new directions and wide range of opportunities for research
and information extraction and future of some technologies such as data mining in the terms of BIG data mining. In this
paper, we present the concept of BIG data and BIG data mining and mentioned problems with BIG data mining and listed
new research directions for BIG data mining and problems with traditional data mining techniques while dealing with
BIG data as well as we have also discuss some comparison between traditional data mining algorithms and some big data
mining algorithms that will be useful for new BIG data mining technology future.
General introduction to Big Data terms and technologies: Velocity, Volume, Variety (3V) and Veracity (4V), NoSQL, Data Science, main data stores (key-value, column, document, graph), Elasticsearch, ...
Presentation of data.be products leveraging Big Data & Elasticsearch
US EPA OSWER Linked Data Workshop 1-Feb-20133 Round Stones
Overview of US EPA's Linked Data Service to launch in early 2013. Open data published using the Linked Data model increases search engines' ability to find and display high value data sets. Linked Data enables policy makers, analysts and developers to more readily access and re-use data.
Quontra solutions is your premier online IT educational destination in UK. It provides online IT courses like Selenium , Hadoop ,CCNA ,Cloud Computing ,Business Analyst and Many other IT courses. All the courses are designed by experienced instructors and designers. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment there is an urgent need for IT professional to keep themselves in trend with Hadoop and Big Data technologies
.
Quontra Specialties :
***All the courses are designed by Experienced Instructors and Designers.
***. Trainers are not limited to the syllabus, they explain off –the-shelf content also.
*** 24X7 technical support team .
***Unlimited access to all recorded sessions ,available after every live class.
***Syllabus built based on professional standards and employer insights.
***Trainers are Certified Experts in their corresponding field and they bring years of industry experience in to the training classes
From AI to Z: How AI is changing the relationship between people and dataiGenius
On the occasion of SMAU Milano 2018, Gabriel Cismondi, COO at iGenius, talks about Artificial Intelligence and how it's changing the relationship between people and data.
Tips & tricks om meer bezoekers naar je webwinkel te lokken door onsite SEO basistechnieken. Deze presentatie maakt deel uit van de opleiding van Easywebshop die werd gegeven op zaterdag 12 november.
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives. This session shows different open source frameworks and products to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.
Due to technological advances, vast data sets (e.g. big data) are increasing now days. Big Data a new term; is used
to identify the collected datasets. But due to their large size and complexity, we cannot manage with our current
methodologies or data mining software tools to extract those datasets. Such datasets provide us with unparalleled
opportunities for modelling and predicting of future with new challenges. So as an awareness of this and
weaknesses as well as the possibilities of these large data sets, are necessary to forecast the future. Today’s we
have an overwhelming growth of data in terms of volume, velocity and variety on web. Moreover this, from a
security and privacy views, both area have an unpredictable growth. So Big Data challenge is becoming one of the
most exciting opportunities for researchers in upcoming years.
Hence this paper discuss about this topic in a broad overview like; its current status; controversy; and challenges to
forecast the future. This paper defines at some of these problems, using illustrations with applications from various
areas. Finally this paper discuss secure management and privacy of big data as one of essential issues.
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
Rapid development and popularization of internet and technological advancement introduced massive amount
of data and still increasing continuously and daily. A very large amount of data generated, collected, stored, transferred by
applications such as sensors, smart mobile devices, cloud systems and social networks put us on the era of BIG data, a data
with huge size, complex and unstructured data types from many origins. So converting these BIG data into useful information
is essential, the technique for discovering hidden interesting patterns and knowledge insights into BIG data introduced
as BIG data mining. BIG data have rises so many problems and challenges related with handling, storing, managing,
transferring, analyzing and mining but it has provides new directions and wide range of opportunities for research
and information extraction and future of some technologies such as data mining in the terms of BIG data mining. In this
paper, we present the concept of BIG data and BIG data mining and mentioned problems with BIG data mining and listed
new research directions for BIG data mining and problems with traditional data mining techniques while dealing with
BIG data as well as we have also discuss some comparison between traditional data mining algorithms and some big data
mining algorithms that will be useful for new BIG data mining technology future.
General introduction to Big Data terms and technologies: Velocity, Volume, Variety (3V) and Veracity (4V), NoSQL, Data Science, main data stores (key-value, column, document, graph), Elasticsearch, ...
Presentation of data.be products leveraging Big Data & Elasticsearch
US EPA OSWER Linked Data Workshop 1-Feb-20133 Round Stones
Overview of US EPA's Linked Data Service to launch in early 2013. Open data published using the Linked Data model increases search engines' ability to find and display high value data sets. Linked Data enables policy makers, analysts and developers to more readily access and re-use data.
Quontra solutions is your premier online IT educational destination in UK. It provides online IT courses like Selenium , Hadoop ,CCNA ,Cloud Computing ,Business Analyst and Many other IT courses. All the courses are designed by experienced instructors and designers. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment there is an urgent need for IT professional to keep themselves in trend with Hadoop and Big Data technologies
.
Quontra Specialties :
***All the courses are designed by Experienced Instructors and Designers.
***. Trainers are not limited to the syllabus, they explain off –the-shelf content also.
*** 24X7 technical support team .
***Unlimited access to all recorded sessions ,available after every live class.
***Syllabus built based on professional standards and employer insights.
***Trainers are Certified Experts in their corresponding field and they bring years of industry experience in to the training classes
From AI to Z: How AI is changing the relationship between people and dataiGenius
On the occasion of SMAU Milano 2018, Gabriel Cismondi, COO at iGenius, talks about Artificial Intelligence and how it's changing the relationship between people and data.
Tips & tricks om meer bezoekers naar je webwinkel te lokken door onsite SEO basistechnieken. Deze presentatie maakt deel uit van de opleiding van Easywebshop die werd gegeven op zaterdag 12 november.
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives. This session shows different open source frameworks and products to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.
EURIB Korte opleiding: Online marketing - Maart 2016Ayman van Bregt
Sessie voor de EURIB Korte opleiding over online marketing. In deze sessie ging het over de de laatste digitale ontwikkelingen op het gebied van marketing & communicatie. Onderwerpen die aan bod kwamen zijn zoekmachinemarketing (seo en sea), social media, mobiele marketing, mediaconsumptie, online gedrag, big data, internet of things (iot) en de economische impact van digitale landschap (digitale disruptie en digitale transformatie) met o.a. de opkomst van de deeleconomie.
From Idea to App (or “How we roll at Small Town Heroes”)Bramus Van Damme
Guestlecture I gave to the students ICT at Odisee, explaining the app development process, how we do certain things at Small Town Heroes, and how we implement QA throughout our process.
BIG DATA Online Training | Hadoop Online Training with Placement Assistance Computer Trainings Online
Big data and hadoop online training is designed by computer online training to deliver a prompt and distinguished training with sophisticated instructors and live face to face sessions as per your schedule on your own laptop/desktop. We included exclusive lab sessions and theoretical sessions on big data to become a professional hadoop developer. We also give you extensive range of study materials for each module to prepare well for the interviews and get settled in
your dream job as a hadoop developer. If you are software professional, analytical professional, ETL developer or project manager then you are welcome to join this course to build strong career as a hadoop developer. We include unique range of topics like MapReduce framework, data loading techniques, Pig and Hive concepts and debugging from hadoop best practices.
Our big data course features:
• Training on core java and analytical skills
• Real time based data analytics project to practice the sessions
• Cloud lab environment to practice data projects
• More focus on advanced topics of big data
• Recorded video sessions and life time free access
• Training on code writing with MapReduce framework
• Best practices from IT world and case studies
Contact us:
http://www.computertrainingsonline.com/big-data-online-training
Email: TrainingandJob@ComputerTrainingsOnline.com
Call: 770-777-1269
Bridging the gap between our online and offline social networkPaul Adams
A 30 minute talk I gave at the IA Summit 2010. If you find the content useful in your work, I'd love to hear your stories and examples to inform a book I'm writing. Please get in touch!
padday at gmail dot com
BIG DATA
Prepared By
Muhammad Abrar Uddin
Introduction
· Big Data may well be the Next Big Thing in the IT world.
· Big data burst upon the scene in the first decade of the 21st century.
· The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
· Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings.
What is BIG DATA?
· ‘Big Data’ is similar to ‘small data’, but bigger in
size
· but having data bigger it requires different approaches:
– Techniques, tools and architecture
· an aim to solve new problems or old problems in a better way
· Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.
What is BIG DATA
· Walmart handles more than 1 million customer transactions every hour.
· Facebook handles 40 billion photos from its user base.
· Decoding the human genome originally took 10years to process; now it can be achieved in one week.
Three Characteristics of Big Data V3s
(
Volume
Data
quantity
) (
Velocity
Data
Speed
) (
Variety
Data
Types
)
1st Character of Big Data
Volume
· A typical PC might have had 10 gigabytes of storage in 2000.
· Today, Facebook ingests 500 terabytes of new data every day.
· Boeing 737 will generate 240 terabytes of flight data during a single
flight across the US.
· The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
2nd Character of Big Data
Velocity
· Clickstreams and ad impressions capture user behavior at millions of events per second
· high-frequency stock trading algorithms reflect market changes within microseconds
· machine to machine processes exchange data between billions of devices
· infrastructure and sensors generate massive log data in real- time
· on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
3rd Character of Big Data
Variety
· Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media.
· Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.
· Big Data analysis includes different types of data
Storing Big Data
· Analyzing your data characteristics
· Selecting data sources for analysis
· Eliminating redundant data
· Establishing the role of NoSQL
· Overview of Big Data stores
· Data models: key value, graph, document, column-family
· Hadoop Distributed File System
· H.
A l'occasion de l'eGov Innovation Day 2014 - DONNÉES DE L’ADMINISTRATION, UNE MINE (qui) D’OR(t) - Philippe Cudré-Mauroux présente Big Data et eGovernment.
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
Big Data, a recent phenomenon. Everyone talks about it, but do you really know what Big Data is? Join our four-part series about Big Data and you will get answers to your questions!
We will cover Introduction to Big Data and available platforms which we can use to deal with Big Data. And in the end, we are going to give you an insight into the possible future of dealing with Big Data.
Today we will start with a brief introduction to Big Data. We will talk about how Big Data is generated, where we can apply it and also about the first world-wide famous platform of BigData 1.0 System, which is Hadoop.
#CHEDTEB
www.chedteb.eu
Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a comprehensive platform designed to address multi-faceted needs by offering multi-function data management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion.
In this research-based session, I’ll discuss what the components are in multiple modern enterprise analytics stacks (i.e., dedicated compute, storage, data integration, streaming, etc.) and focus on total cost of ownership.
A complete machine learning infrastructure cost for the first modern use case at a midsize to large enterprise will be anywhere from $3 million to $22 million. Get this data point as you take the next steps on your journey into the highest spend and return item for most companies in the next several years.
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
Do you ever wonder how data-driven organizations fuel analytics, improve customer experience, and accelerate business productivity? They are successful by governing and mastering data effectively so they can get trusted data to those who need it faster. Efficient data discovery, mastering and democratization is critical for swiftly linking accurate data with business consumers. When business teams can quickly and easily locate, interpret, trust, and apply data assets to support sound business judgment, it takes less time to see value.
Join data mastering and data governance experts from Informatica—plus a real-world organization empowering trusted data for analytics—for a lively panel discussion. You’ll hear more about how a single cloud-native approach can help global businesses in any economy create more value—faster, more reliably, and with more confidence—by making data management and governance easier to implement.
What is data literacy? Which organizations, and which workers in those organizations, need to be data-literate? There are seemingly hundreds of definitions of data literacy, along with almost as many opinions about how to achieve it.
In a broader perspective, companies must consider whether data literacy is an isolated goal or one component of a broader learning strategy to address skill deficits. How does data literacy compare to other types of skills or “literacy” such as business acumen?
This session will position data literacy in the context of other worker skills as a framework for understanding how and where it fits and how to advocate for its importance.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Uncover how your business can save money and find new revenue streams.
Driving profitability is a top priority for companies globally, especially in uncertain economic times. It's imperative that companies reimagine growth strategies and improve process efficiencies to help cut costs and drive revenue – but how?
By leveraging data-driven strategies layered with artificial intelligence, companies can achieve untapped potential and help their businesses save money and drive profitability.
In this webinar, you'll learn:
- How your company can leverage data and AI to reduce spending and costs
- Ways you can monetize data and AI and uncover new growth strategies
- How different companies have implemented these strategies to achieve cost optimization benefits
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
In this webinar, Bob will focus on:
-Selecting the appropriate metadata to govern
-The business and technical value of a data catalog
-Building the catalog into people’s routines
-Positioning the data catalog for success
-Questions the data catalog can answer
Because every organization produces and propagates data as part of their day-to-day operations, data trends are becoming more and more important in the mainstream business world’s consciousness. For many organizations in various industries, though, comprehension of this development begins and ends with buzzwords: “Big Data,” “NoSQL,” “Data Scientist,” and so on. Few realize that all solutions to their business problems, regardless of platform or relevant technology, rely to a critical extent on the data model supporting them. As such, data modeling is not an optional task for an organization’s data effort, but rather a vital activity that facilitates the solutions driving your business. Since quality engineering/architecture work products do not happen accidentally, the more your organization depends on automation, the more important the data models driving the engineering and architecture activities of your organization. This webinar illustrates data modeling as a key activity upon which so much technology and business investment depends.
Specific learning objectives include:
- Understanding what types of challenges require data modeling to be part of the solution
- How automation requires standardization on derivable via data modeling techniques
- Why only a working partnership between data and the business can produce useful outcomes
Analytics play a critical role in supporting strategic business initiatives. Despite the obvious value to analytic professionals of providing the analytics for these initiatives, many executives question the economic return of analytics as well as data lakes, machine learning, master data management, and the like.
Technology professionals need to calculate and present business value in terms business executives can understand. Unfortunately, most IT professionals lack the knowledge required to develop comprehensive cost-benefit analyses and return on investment (ROI) measurements.
This session provides a framework to help technology professionals research, measure, and present the economic value of a proposed or existing analytics initiative, no matter the form that the business benefit arises. The session will provide practical advice about how to calculate ROI and the formulas, and how to collect the necessary information.
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
Data Mesh is a trending approach to building a decentralized data architecture by leveraging a domain-oriented, self-service design. However, the pure definition of Data Mesh lacks a center of excellence or central data team and doesn’t address the need for a common approach for sharing data products across teams. The semantic layer is emerging as a key component to supporting a Hub and Spoke style of organizing data teams by introducing data model sharing, collaboration, and distributed ownership controls.
This session will explain how data teams can define common models and definitions with a semantic layer to decentralize analytics product creation using a Hub and Spoke architecture.
Attend this session to learn about:
- The role of a Data Mesh in the modern cloud architecture.
- How a semantic layer can serve as the binding agent to support decentralization.
- How to drive self service with consistency and control.
Enterprise data literacy. A worthy objective? Certainly! A realistic goal? That remains to be seen. As companies consider investing in data literacy education, questions arise about its value and purpose. While the destination – having a data-fluent workforce – is attractive, we wonder how (and if) we can get there.
Kicking off this webinar series, we begin with a panel discussion to explore the landscape of literacy, including expert positions and results from focus groups:
- why it matters,
- what it means,
- what gets in the way,
- who needs it (and how much they need),
- what companies believe it will accomplish.
In this engaging discussion about literacy, we will set the stage for future webinars to answer specific questions and feature successful literacy efforts.
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
Change is hard, especially in response to negative stimuli or what is perceived as negative stimuli. So organizations need to reframe how they think about data privacy, security and governance, treating them as value centers to 1) ensure enterprise data can flow where it needs to, 2) prevent – not just react – to internal and external threats, and 3) comply with data privacy and security regulations.
Working together, these roles can accelerate faster access to approved, relevant and higher quality data – and that means more successful use cases, faster speed to insights, and better business outcomes. However, both new information and tools are required to make the shift from defense to offense, reducing data drama while increasing its value.
Join us for this panel discussion with experts in these fields as they discuss:
- Recent research about where data privacy, security and governance stand
- The most valuable enterprise data use cases
- The common obstacles to data value creation
- New approaches to data privacy, security and governance
- Their advice on how to shift from a reactive to resilient mindset/culture/organization
You’ll be educated, entertained and inspired by this panel and their expertise in using the data trifecta to innovate more often, operate more efficiently, and differentiate more strategically.
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
With technological innovation and change occurring at an ever-increasing rate, it’s hard to keep track of what’s hype and what can provide practical value for your organization. Join this webinar to see the results of a recent DATAVERSITY survey on emerging trends in Data Architecture, along with practical commentary and advice from industry expert Donna Burbank.
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
As DATAVERSITY’s RWDG series hurdles into our 12th year, this webinar takes a quick look behind us, evaluates the present, and predicts the future of Data Governance. Based on webinar numbers, hot Data Governance topics have evolved over the years from policies and best practices, roles and tools, data catalogs and frameworks, to supporting data mesh and fabric, artificial intelligence, virtualization, literacy, and metadata governance.
Join Bob Seiner as he reflects on the past and what has and has not worked, while sharing examples of enterprise successes and struggles. In this webinar, Bob will challenge the audience to stay a step ahead by learning from the past and blazing a new trail into the future of Data Governance.
In this webinar, Bob will focus on:
- Data Governance’s past, present, and future
- How trials and tribulations evolve to success
- Leveraging lessons learned to improve productivity
- The great Data Governance tool explosion
- The future of Data Governance
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
Would you share your bank account information on social media? How about shouting your social security number on the New York City subway? We didn’t think so either – that’s why data governance is consistently top of mind.
In this webinar, we’ll discuss the common Cloud data governance best practices – and how to apply them today. Join us to uncover Google Cloud’s investment in data governance and learn practical and doable methods around key management and confidential computing. Hear real customer experiences and leave with insights that you can share with your team. Let’s get solving.
Topics that you will hear addressed in this webinar:
- Understanding the basics of Cloud Incident Response (IR) and anticipated data governance trends
- Best practices for key management and apply data governance to your day-to-day
- The next wave of Confidential Computing and how to get started, including a demo
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the enterprise mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and data architecture. William will kick off the fifth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Too often I hear the question “Can you help me with our data strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component: the data strategy itself. A more useful request is: “Can you help me apply data strategically?” Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) data strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” This program refocuses efforts on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. It also contributes to three primary organizational data goals. Learn how to improve the following:
- Your organization’s data
- The way your people use data
- The way your people use data to achieve your organizational strategy
This will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs as organizations identify prioritized areas where better assets, literacy, and support (data strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why data strategy is necessary for effective data governance
- An overview of prerequisites for effective strategic use of data strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
Who Should Own Data Governance – IT or Business?DATAVERSITY
The question is asked all the time: “What part of the organization should own your Data Governance program?” The typical answers are “the business” and “IT (information technology).” Another answer to that question is “Yes.” The program must be owned and reside somewhere in the organization. You may ask yourself if there is a correct answer to the question.
Join this new RWDG webinar with Bob Seiner where Bob will answer the question that is the title of this webinar. Determining ownership of Data Governance is a vital first step. Figuring out the appropriate part of the organization to manage the program is an important second step. This webinar will help you address these questions and more.
In this session Bob will share:
- What is meant by “the business” when it comes to owning Data Governance
- Why some people say that Data Governance in IT is destined to fail
- Examples of IT positioned Data Governance success
- Considerations for answering the question in your organization
- The final answer to the question of who should own Data Governance
It is clear that Data Management best practices exist and so does a useful process for improving existing Data Management practices. The question arises: Since we understand the goal, how does one design a process for Data Management goal achievement? This program describes what must be done at the programmatic level to achieve better data use and a way to implement this as part of your data program. The approach combines DMBoK content and CMMI/DMM processes – permitting organizations with the opportunity to benefit from the best of both. It also permits organizations to understand:
- Their current Data Management practices
- Strengths that should be leveraged
- Remediation opportunities
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
MLOps is a practice for collaboration between Data Science and operations to manage the production machine learning (ML) lifecycles. As an amalgamation of “machine learning” and “operations,” MLOps applies DevOps principles to ML delivery, enabling the delivery of ML-based innovation at scale to result in:
Faster time to market of ML-based solutions
More rapid rate of experimentation, driving innovation
Assurance of quality, trustworthiness, and ethical AI
MLOps is essential for scaling ML. Without it, enterprises risk struggling with costly overhead and stalled progress. Several vendors have emerged with offerings to support MLOps: the major offerings are Microsoft Azure ML and Google Vertex AI. We looked at these offerings from the perspective of enterprise features and time-to-value.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
DataEd Online: Demystifying Big Data
1. Copyright 2013 by Data Blueprint
1
Demystifying Big Data
Yes, we face a data deluge and big data seems to
be largely about how to deal with it. But much of
what has been written about big data is focused on
selling hardware and services. The truth is that
until the concept of big data can be objectively
defined, any measurements, claims of success,
quantifications, etc. must be viewed skeptically
and with suspicion. While both the need for and
approaches to these new requirements are faced
by virtually every organization, jumping into the
fray ill-prepared has (to date) reproduced the
same dismal IT project results.
Date: May 14, 2013
Time: 2:00 PM ET/11:00 AM PT
Presenter:Peter Aiken, Ph.D.
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
swept away the old world with a vision of a new one.
Today, we seem to be entering the era of Big Data
– Michael Coren
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
swept away the old world with a vision of a new one.
Today, we seem to be entering the era of Big Data
– Michael Coren
2. Copyright 2013 by Data Blueprint
Get Social With Us!
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
Ask questions and submit
your comments: #dataed
2
Like Us on Facebook
www.facebook.com/
datablueprint
Post questions and comments
Find industry news, insightful
content
and event updates.
Join the Group
Data Management &
Business Intelligence
Ask questions, gain insights
and collaborate with fellow
data management
professionals
4. Copyright 2013 by Data Blueprint
Meet Your Presenter:
Peter Aiken, Ph.D.
• 30 years of experience in data
management
– Multiple international awards
– Founder, Data Blueprint
(http://datablueprint.com)
• 8 books and dozens of articles
• Experienced w/ 500+ data management
practices in 20 countries
• Multi-year immersions with organizations
as diverse as the US DoD, Deutsche Bank,
Nokia, Wells Fargo, and the
Commonwealth of Virginia
4
5. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
5
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways & Q&A
Demystifying Big Data
Tweeting now:
#dataed
6. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
6
7. Copyright 2013 by Data Blueprint
7
Bills of Mortality by Captain John Graunt
13. Copyright 2013 by Data Blueprint
13
John Snow's 1854 Cholera Map of London
14. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
14
15. Copyright 2013 by Data Blueprint
Data Inflation
Unit Size What
it
means
Bit
(b) 1
or
0
Short
for
“binary
digit”,
aAer
the
binary
code
(1
or
0)
computers
use
to
store
and
process
data
Byte
(B) 8
bits
Enough
informaFon
to
create
an
English
leGer
or
number
in
computer
code.
It
is
the
basic
unit
of
compuFng
Kilobyte
(KB) 1,000,
or
210,
bytes From
“thousand”
in
Greek.
One
page
of
typed
text
is
2KB
Megabyte
(MB) 1,000KB;
220
bytes
From
“large”
in
Greek.
The
complete
works
of
Shakespeare
total
5MB.
A
typical
pop
song
is
about
4MB
Gigabyte
(GB) 1,000MB;
230
bytes From
“giant”
in
Greek.
A
two-‐hour
film
can
be
compressed
into
1-‐2GB
Terabyte
(TB) 1,000GB;
240
bytes
From
“monster”
in
Greek.
All
the
catalogued
books
in
America’s
Library
of
Congress
total
15TB
Petabyte
(PB) 1,000TB;
250
bytes
All
leGers
delivered
by
America’s
postal
service
this
year
will
amount
to
around
5PB.
Google
processes
around
1PB
every
hour
Exabyte
(EB) 1,000PB;
260
bytes Equivalent
to
10
billion
copies
of
The
Economist
ZeGabyte
(ZB) 1,000EB;
270
bytes The
total
amount
of
informaFon
in
existence
this
year
is
forecast
to
be
around
1.2ZB
YoGabyte
(YB) 1,000ZB;
280
bytes Currently
too
big
to
imagine
The prefixes are set by an intergovernmental group, the International Bureau of Weights and Measures. Source: The Economist Yotta and Zetta were
added in 1991; terms for larger amounts have yet to be established
15
16. Copyright 2013 by Data Blueprint
What do they mean big?
16
"Every 2 days we create as much
information as we did up to 2003"
– Eric Schmidt
The
number
of
things
that
can
produce
data
is
rapidly
growing
(smart
phones
for
example)
IP
traffic
will
quadruple
by
2015
– Asigra
2012
17. Copyright 2013 by Data Blueprint
Google Search Results for the Term "Big Data"
17
Data from Google Trends Source: Gartner (October 2012)
18. Copyright 2013 by Data Blueprint
Number of Internet Pages Mentioning Big Data
18
Data from Google Trends Source: Gartner (October 2012)
19. Copyright 2013 by Data Blueprint
Increasingly individuals make use of
the things data producing capabilities
to perform services for them
19
20. • IBM
– 100 terabytes uploaded daily
– $2.1 billion spent on mobile ads in 2011
– 4.8 trillion ad impressions/daily
• WIPRO
– 2.9 million emails sent every second
– 20 hours of video uploaded to youtube every minute
– 50 million tweets per day
• Asigra
– 2.5 quintillion bytes created daily
– 90% of data was created in the last two years
– By 2015 8 zettabytes will have been created
Copyright 2013 by Data Blueprint
Examples
20
21. Copyright 2013 by Data Blueprint
21
• 60 GB of data/second
• 200,000 hours of big data
will be generated testing
systems
• 2,000 hours media
coverage/daily
• 845 million Facebook users
averaging 15 TB/day
• 13,000 tweets/second
• 4 billion watching
• 8.5 billion devices
connected
2012 London Summer Games
22. 22
Sloan Management Review/Harvard Business Review
Copyright 2013 by Data BlueprintMIT Sloan Management Review Fall 2012 Page 22 By Thomas H. Davenport, Paul Barth And Randy Bean
23. Copyright 2013 by Data Blueprint
Big Data(has something to do with Vs - doesn't it?)
• Volume
– Amount of data
• Velocity
– Speed of data in and out
• Variety
– Range of data types and sources
• 2001 Doug Laney
• Variability
– Many options or variable interpretations confound analysis
• 2011 ISRC
• Vitality
– A dynamically changing Big Data environment in which analysis and predictive
models must continually be updated as changes occur to seize opportunities
as they arrive
• 2011 CIA
• Virtual
– Scoping the discussion to only include online assets
• 2012 Courtney Lambert
23
24. Copyright 2013 by Data Blueprint
Nanex 1/2 Second Trading Data(May 2, 2013 Johnson and Johnson)
24
http://www.youtube.com/watch?v=LrWfXn_mvK8
The European Union last year approved a new rule mandating that all trades must exist for at
least a half-second in this instance that is 1,200 orders and 215 actual trades
25. Copyright 2013 by Data Blueprint
25
Historyflow-Wikipedia entry for the word “Islam”
26. Copyright 2013 by Data Blueprint
26
Spatial Information Flow-New York Talk Exchange
27. Some Far-out Thoughts
on Computers by Orrin
Clotworthy
• Predicted use of
not just
computing in the
intelligence
community
• Also forecast
predictive
analytics
• Accompanying
privacy
challenges
Copyright 2013 by Data Blueprint
27
28. • Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery
and process optimization.
– Gartner 2012
• Big data refers to datasets whose size is beyond the ability of
typical database software tools to capture, store, manage, and analyze.
– IBM 2012
• Big data usually includes data sets with sizes beyond the ability of
commonly-used software tools to capture, curate, manage, and process the
data within a tolerable elapsed time.
– Wikipedia
• Shorthand for advancing trends in technology that open the door to a new
approach to understanding the world and making decisions.
– NY Times 2012
• Big data is about putting the "I" back into IT.
– Peter Aiken 2007
• We have no objective definition of big data!
– Any measurements, claims of success, quantifications, etc. must be viewed
skeptically and with suspicion!
• Question: "Would it be more useful to refer to "big data techniques?"
Copyright 2013 by Data Blueprint
Defining Big Data
28
29. Copyright 2013 by Data Blueprint
Big Data Techniques
• New techniques available to impact the productivity (order of
magnitude) of any analytical insight cycle that compliment,
enhance, or replace conventional (existing) analysis methods
• Big data techniques are currently characterized by:
– Continuous, instantaneously
available data sources
– Non-von Neumann
Processing (defined later in the presentation)
– Capabilities approaching
or past human comprehension
– Architecturally enhanceable
identity/security capabilities
– Other tradeoff-focused data processing
• So a good question becomes "where in our existing architecture
can we most effectively apply Big Data Techniques?"
29
30. Computers
Human resources
Communication facilities
Software
Management
responsibilities
Policies,
directives,
and rules
Data
Copyright 2013 by Data Blueprint
What Questions Can Architectures Address?
30
• How and why do the
components interact?
• Where do they go?
• When are they needed?
• Why and how will the
changes be implemented?
• What should be managed
organization-wide and what
should be managed
locally?
• What standards should be
adopted?
• What vendors should be
chosen?
• What rules should govern
the decisions?
• What policies should guide
the process?
31. !
!
!
!
Copyright 2013 by Data Blueprint 31
Organizational Needs
become instantiated
and integrated into an
Data/Information
Architecture
Informa(on)System)
Requirements
authorizes and
articulates
satisfyspecificorganizationalneeds
Data Architectures produce and are made up of information models that are
developed in response to organizational needs
32. Copyright 2013 by Data Blueprint
Enterprises Spend $38 Million a Year on Data
32
• Worldwide spending on
business information is
now $1.1 trillion/year
• Enterprises spend an
average of $38 million
on information/year
• Small and Medium
sized Businesses on
average spend
$332,000
– http://www.cio.com.au/article/429681/
five_steps_how_better_manage_your_data/
33. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
33
34. Copyright 2013 by Data Blueprint
Gartner Five-phase Hype Cycle
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
34
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest
trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the
technology shake out or fail. Investments continue only if the surviving providers improve their products to the
satisfaction of early adopters.
Peak of Inflated Expectations: Early publicity produces a number of
success stories—often accompanied by scores of failures. Some
companies take action; many do not.
Slope of Enlightenment: More instances of how the technology can benefit the
enterprise start to crystallize and become more widely understood. Second- and third-
generation products appear from technology providers. More enterprises fund pilots;
conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to
take off. Criteria for assessing provider viability are more
clearly defined. The technology’s broad market
applicability and relevance are clearly paying off.
35. Copyright 2013 by Data Blueprint
Gartner Big Data Hype Cycle
35
"A focus on big data is not a substitute for the
fundamentals of information management."
36. Copyright 2013 by Data Blueprint
36
Big Data in Gartner’s Hype Cycle
37. Copyright 2013 by Data Blueprint
Technology Continues to Advance
37
• (Gordon) Moore's law
– Over time, the number of transistors on
integrated circuits doubles approximately
every two years
38. Copyright 2013 by Data Blueprint
Pick any two!
and there are
still tradeoffs
to be made!
38
39. • Faster processors outstripped
not only the hard disk, but main
memory
– Hard disk too slow
– Memory too small
• Flash drives remove both
bottlenecks
– Combined Apple and Yahoo
have spend more than $500
million to date
• Make it look like traditional
storage or more system
memory
– Minimum 10x improvements
– Dragonstone server is 3.2 tb
flash memory (Facebook)
• Bottom line - new capabilities!
Copyright 2013 by Data Blueprint
"There’s now a blurring between the storage world and the memory world"
39
40. • von Neumann
bottleneck
(computer science)
– "An inefficiency inherent in
the design of any von
Neumann machine that
arises from the fact that
most computer time is
spent in moving
information between
storage and the central
processing unit rather than
operating on it"
[http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck]
• Michael Stonebraker
– Ingres (Berkeley/MIT)
– Modern database
processing is
approximately 4%
efficient
• Many "big data
architectures are
attempts to address
this, but:
– Zero sum game
– Trade characteristics
against each other
• Reliability
• Predictability
– Google/MapReduce/
Bigtable
– Amazon/Dynamo
– Netflix/Chaos Monkey
– Hadoop
– McDipper
• Big data exploits
non-von Neumann
processing
Copyright 2013 by Data Blueprint
Non-von Neumann Processing/Efficiencies
40
41. Copyright 2013 by Data Blueprint
Potential Tradeoffs:
CAP theorem: consistency, availability and partition-tolerance
41
Partition
(Fault)
Tolerance
AvailabilityConsistency
RDBMS NoSQL
Small datasets can be both consistent & available
Atomicity
Consistency
Isolation
Durability
Basic
Availability
Soft-state
Eventual consistency
42. Copyright 2013 by Data Blueprint
Potential either/or Tradeoffs
42
SQL Big Data
Privacy Big Data
Security Big Data
?
Massive High-
speed Flexible
43. • Patterns/objects,
hypotheses emerge
– What can be
observed?
• Operationalizing
– The dots can be
repeatedly connected
• Things are
happening
– Sensemaking
techniques address
"what" is happening?
• Patterns/objects,
hypotheses emerge
– What can be
observed?
• Operationalizing
– The dots can be
repeatedly connected
– "Big Data"
contributions are
shown in orange
<-‐Feedback
"Sensemaking"
Techniques
!
Exis&ng!
Knowledge
/base
Copyright 2013 by Data Blueprint
Analytics Insight Cycle
43
Volume
Velocity
Variety
PotenFal/
actual
insights
Discernm
ent
Exploitable
Insight
PaGern/Object
Emergence
Com
bined/
inform
ed
insights
AnalyDcal
boEleneck
44. Copyright 2013 by Data Blueprint
• Data analysis struggles with the social
– Your brain is excellent at social cognition - people can
• Mirror each other’s emotional states
• Detect uncooperative behavior
• Assign value to things through emotion
– Data analysis measures the quantity of social
interactions but not the quality
• Map interactions with co-workers you see during work days
• Can't capture devotion to childhood friends seen annually
– When making (personal) decisions about social
relationships, it’s foolish to swap the amazing machine
in your skull for the crude machine on your desk
• Data struggles with context
– Decisions are embedded in sequences and contexts
– Brains think in stories - weaving together multiple
causes and multiple contexts
– Data analysis is pretty bad at
• Narratives / Emergent thinking / Explaining
• Data creates bigger haystacks
– More data leads to more statistically significant
correlations
– Most are spurious and deceive us
– Falsity grows exponentially greater amounts of data
we collect
• Big data has trouble with big problems
– For example: the economic stimulus debate
– No one has been persuaded by data to switch sides
• Data favors memes over masterpieces
– Detect when large numbers of people take an instant
liking to some cultural product
– Products are hated initially because they are unfamiliar
• Data obscures values
– Data is never raw; it’s always structured according to
somebody’s predispositions and values
44
Some Big Data Limitations
45. Savings come from a variety
of agreed upon categories
and values:
• Reduced hospital re-
admissions
• Patient Monitoring:
Inpatient, out-patient,
emergency visits and ICU
• Preventive care for ACO
• Epidemiology
• Patient care quality and
program analysis
Copyright 2013 by Data Blueprint
$300 billion is the potential annual value to health care
45
$165
$108
$47
$9$5
Transparency in clinical data and clinical decision support
Research & Development
Advanced fraud detection-performance based drug pricing
Public health surveillance/response systems
Aggregation of patient records, online platforms, & communities
46. 0
25
50
75
100
Current Improved
Copyright 2013 by Data Blueprint
Reversing The Measures
• Currently:
– Analysts spend 80% of their time manipulating data and 20% of their time
analyzing data
– Hidden productivity bottlenecks
• After rearchitecting:
– Analysts spend less time manipulating data and more of their time analyzing data
– Significant improvements in knowledge worker productivity
46
Manipulation Analysis
47. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
47
48. Copyright 2013 by Data Blueprint
What do we teach business people about data?
48
What percentage of the deal with it daily?
49. Copyright 2013 by Data Blueprint
What do we teach IT professionals about data?
• 1 course
– How to build a new
database
– 80% if UT expenses
are used to improve
existing IT assets
• What impressions do
IT professionals get
from this education?
– Data is a technical skill
that is used to develop
new databases
• This is not the best
way to educate IT and
business
professionals - every
organization's
– Sole, non-depletable,
non-degrading,
durable, strategic
asset
49
50. Copyright 2013 by Data Blueprint
Application-Centric Development
Original articulation from Doug Bagley @ Walmart
50
t
t
Strategy
Goals/Objectives
Systems/Applications
Network/Infrastructure
Data/Information
t
• In support of strategy, the organization
develops specific goals/objectives
• The goals/objectives drive the
development of specific systems/
applications
• Development of systems/applications
leads to network/infrastructure
requirements
• Data/information are typically
considered after the systems/
applications and network/
infrastructure have been articulated
• Problems with this approach:
– Ensures that data is formed
around the application and not
the information requirements
– Process are narrowly
formed around applications
– Very little data reuse is possible
51. Einstein Quote
Copyright 2013 by Data Blueprint
51
"The significant
problems we face
cannot be solved
at the same level
of thinking we
were at when we
created them."
- Albert Einstein
52. Copyright 2013 by Data Blueprint
What does it mean to treat data as an organizational asset?
• Assets are economic resources
– Must own or control
– Must use to produce value
– Value can be converted into cash
• An asset is a resource controlled by
the organization as a result of past
events or transactions and from which
future economic benefits are expected
to flow to the organization [Wikipedia]
• With assets:
– Formalize the care and feeding of data
• Cash management - HR planning
– Put data to work in unique/significant
ways
• Identify data the organization will need
[Redman 2008]
52
53. Copyright 2013 by Data Blueprint
Data-Centric Development Flow
Original articulation from Doug Bagley @ Walmart
53
t
t
Strategy
Goals/Objectives
Data/Information
Network/Infrastructure
Systems/Applications
t
• In support of strategy, the organization
develops specific goals/objectives
• The goals/objectives drive the
development of specific data/
information assets with an eye to
organization-wide usage
• Network/infrastructure components are
developed to support organization-
wide use of data
• Development of systems/applications
is derived from the
data/network architecture
• Advantages of this approach:
– Data/information assets are
developed from an
organization-wide perspective
– Systems support organizational
data needs and compliment
organizational process flows
– Maximum data/information reuse
54. Top
Operations
Job
Copyright 2013 by Data Blueprint
CDO Reporting
54
Top Job
Top
Finance
Job
Top
Information
Technology
Job
Top
Marketing
Job
• There is enough work to justify the function
• There is not much talent
• The CDO provides significant input to the Top Information Technology Job
Data Governance Organization
Chief
Data
Officer
55. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
55
56. Copyright 2013 by Data Blueprint
56
Traditional Systems Life Cycle Challenges
projectcartoon.com
• Original business concept
• As the consultant described it
• As the customer explained it
• How the project leader understood it
• How the programmer wrote it
• What the beta testers received
• What operations installed
• As accredited for operation
• When it was delivered
• How the project was documented
• How the help desk supported it
• How the customer was billed
• After patches were applied
• What the customer wanted
57. Copyright 2013 by Data Blueprint
"Waterfall" model of
Systems Development
57
58. Copyright 2013 by Data Blueprint
"Waterfall" model (with iteration possible)
58
59. Copyright 2013 by Data Blueprint
Spiral Model of
Systems
Development
59
Barry W. Boehm "A Spiral Model of
Software Development and Enhancement"
ACM SIGSOFT Software Engineering Notes
August 1986, 11(4):14-24
"The major
distinguishing
feature of the
spiral model is
that it creates a
risk-driven
approach ...
rather than a
primarily
document-driven
or code-driven
process"
60. Copyright 2013 by Data Blueprint
Prototyping & Big Data Technologies
60
61. Advanced
Data
Practices
• Cloud
• MDM
• Mining
• Big Data
• Analytics
• Warehousing
• SOA
Copyright 2013 by Data Blueprint
• 5 Data
management
practices areas /
data management
basics ...
• ... are necessary
but insufficient
prerequisites to
organizational data
leveraging
applications that is
self actualizing data
or advanced data
practices
Hierarchy of Data Management Practices (after Maslow)
Basic Data Management Practices
– Data Program Management
– Organizational Data Integration
– Data Stewardship
– Data Development
– Data Support Operations
http://3.bp.blogspot.com/-ptl-9mAieuQ/T-idBt1YFmI/AAAAAAAABgw/Ib-nVkMmMEQ/s1600/maslows_hierarchy_of_needs.png
63. • Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
63
64. Copyright 2013 by Data Blueprint
Which Source of Data Represents the Most Immediate Opportunity?
64
Guidance
• The real change: cost-effectiveness
and timely delivery
• Business process optimization — not
technology
• High fidelity and quality information
• Big data technology, all is not new
• Keep your focus, develop skills
(Source: Gartner (January 2013)
65. Copyright 2013 by Data Blueprint
Gartner Recommendations
65
Impacts Top Recommendations
Some of the new analytics that are made
possible by big data have no precedence,
so innovative thinking will be required to
achieve value
Treat big data projects as innovation
projects that will require change
management efforts. The business will
take time to trust new data sources and
new analytics
Creative thinking can unearth valuable
information sources already inside the
enterprise that are underused
Work with the business to conduct an
inventory of internal data sources outside
of IT's direct control, and consider
augmenting existing data that is IT
'controlled.' With an innovation mindset,
explore the potential insight that can be
gained from each of these sources
Big data technologies often create the
ability to analyze faster, but getting value
from faster analytics requires business
changes
Ensure that big data projects that improve
analytical speed always include a process
redesign effort that aims at getting
maximum benefit from that speed
Gartner 2012
66. Copyright 2013 by Data Blueprint
Results: It is not always about money
• Solution:
– Integrate multiple databases into
one to create holistic view of data
– Automation of manual process
• Results:
– Data is passed safely and effectively
– Eliminate inconsistencies,
redundancies, and corruption
– Ability to cross-analyze
– Significantly reduced turnaround
time for matching patients with
potential donor -> increased
potential to make life-saving
connection in a manner that is
faster, safer and more reliable
– Increased safe matches from 3 out
of 10 to 6 out of 10
66
67. Copyright 2013 by Data Blueprint
Data versus Tools-A History Lesson
• Sophisticated tool without data are useless
• Mediocre tools with the data are frustrating
• Analysts will always opt for frustration over futility, if
that is their only option
– Ira "Gus" Hunt CIA/CTO
• http://www.huffingtonpost.com/2013/03/20/cia-gus-hunt-big-data_n_2917842.html
67
69. Unlock Business Value through
Data Quality Engineering
June 11, 2013 @ 2:00 PM ET/11:00 AM PT
Data Systems Integration & Business Value Pt. 1:
Metadata
July 9, 2013 @ 2:00 PM ET/11:00 AM PT
Sign up here:
www.datablueprint.com/webinar-schedule
or www.dataversity.net
Copyright 2013 by Data Blueprint
Upcoming Events
69
70. 10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056