The document provides an overview of big data, including definitions of data units like bits and bytes. It discusses how data is growing exponentially in terms of volume, velocity, and variety. Traditional relational database management systems cannot handle this scale of data. Therefore, new approaches like Not Only SQL (NOSQL) databases and Hadoop were developed to better manage large, diverse, and fast-moving data. These new big data architectures allow problems to be broken into pieces and processed in parallel across many servers for improved speed and scalability compared to traditional approaches. The document concludes by noting that skills like communication, presentation, and understanding business and statistics will be important for working with big data.
Big Data to SMART Data : Process scenario
Scenario of an implementation of a transformation process of the Data towards exploitable data and representative with treatments of the streaming, the distributed systems, the messages, the storage in an NoSQL environment, a management with an ecosystem Big Data graphic visualization of the data with the technologies:
Apache Storm, Apache Zookeeper, Apache Kafka, Apache Cassandra, Apache Spark and Data-Driven Document.
Big Data to SMART Data : Process scenario
Scenario of an implementation of a transformation process of the Data towards exploitable data and representative with treatments of the streaming, the distributed systems, the messages, the storage in an NoSQL environment, a management with an ecosystem Big Data graphic visualization of the data with the technologies:
Apache Storm, Apache Zookeeper, Apache Kafka, Apache Cassandra, Apache Spark and Data-Driven Document.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
to effectively analyze this kind of information is now seen as a key competitive advantage to better inform decisions. In order to do so, organizations employ Sentiment Analysis (SA) techniques on these data. However, the usage of social media around the world is ever-increasing, which considerably accelerates massive data generation and makes traditional SA systems unable to deliver useful insights. Such volume of data can be efficiently analyzed using the combination of SA techniques and Big Data technologies. In fact, big data is not a luxury but an essential necessary to make valuable predictions. However, there are some challenges associated with big data such as quality that could highly affect the SA systems’ accuracy that use huge volume of data. Thus, the quality aspect should be addressed in order to build reliable and credible systems. For this, the goal of our research work is to consider Big Data Quality Metrics (BDQM) in SA that rely of big data. In this paper, we first highlight the most eloquent BDQM that should be considered throughout the Big Data Value Chain (BDVC) in any big data project. Then, we measure the impact of BDQM on a novel SA method accuracy in a real case study by giving simulation results.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
Relational databases have pretty much ruled over the IT world for the last 30 years. However, Web 2.0 and the incipient Internet of Things (IoT) are some of the sources of a data explosion that has proved to exceed the limits of what modern relational databases can handle in a growing number of cases. As a result, new technologies had to be developed to handle these new use cases. We generally group these technologies under the umbrella of Big Data. In this two part presentation, we will start by understanding how relational databases have evolved to become the powerhouses they are today. In part 2 we will look at how non SQL databases are tackling the big data problem to scale beyond what relational databases can provide us today.
Business Valuation Principles for EntrepreneursBen Wann
This insightful presentation is designed to equip entrepreneurs with the essential knowledge and tools needed to accurately value their businesses. Understanding business valuation is crucial for making informed decisions, whether you're seeking investment, planning to sell, or simply want to gauge your company's worth.
Cracking the Workplace Discipline Code Main.pptxWorkforce Group
Cultivating and maintaining discipline within teams is a critical differentiator for successful organisations.
Forward-thinking leaders and business managers understand the impact that discipline has on organisational success. A disciplined workforce operates with clarity, focus, and a shared understanding of expectations, ultimately driving better results, optimising productivity, and facilitating seamless collaboration.
Although discipline is not a one-size-fits-all approach, it can help create a work environment that encourages personal growth and accountability rather than solely relying on punitive measures.
In this deck, you will learn the significance of workplace discipline for organisational success. You’ll also learn
• Four (4) workplace discipline methods you should consider
• The best and most practical approach to implementing workplace discipline.
• Three (3) key tips to maintain a disciplined workplace.
More Related Content
Similar to The causes and consequences of too many bits
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
to effectively analyze this kind of information is now seen as a key competitive advantage to better inform decisions. In order to do so, organizations employ Sentiment Analysis (SA) techniques on these data. However, the usage of social media around the world is ever-increasing, which considerably accelerates massive data generation and makes traditional SA systems unable to deliver useful insights. Such volume of data can be efficiently analyzed using the combination of SA techniques and Big Data technologies. In fact, big data is not a luxury but an essential necessary to make valuable predictions. However, there are some challenges associated with big data such as quality that could highly affect the SA systems’ accuracy that use huge volume of data. Thus, the quality aspect should be addressed in order to build reliable and credible systems. For this, the goal of our research work is to consider Big Data Quality Metrics (BDQM) in SA that rely of big data. In this paper, we first highlight the most eloquent BDQM that should be considered throughout the Big Data Value Chain (BDVC) in any big data project. Then, we measure the impact of BDQM on a novel SA method accuracy in a real case study by giving simulation results.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
Relational databases have pretty much ruled over the IT world for the last 30 years. However, Web 2.0 and the incipient Internet of Things (IoT) are some of the sources of a data explosion that has proved to exceed the limits of what modern relational databases can handle in a growing number of cases. As a result, new technologies had to be developed to handle these new use cases. We generally group these technologies under the umbrella of Big Data. In this two part presentation, we will start by understanding how relational databases have evolved to become the powerhouses they are today. In part 2 we will look at how non SQL databases are tackling the big data problem to scale beyond what relational databases can provide us today.
Similar to The causes and consequences of too many bits (20)
Business Valuation Principles for EntrepreneursBen Wann
This insightful presentation is designed to equip entrepreneurs with the essential knowledge and tools needed to accurately value their businesses. Understanding business valuation is crucial for making informed decisions, whether you're seeking investment, planning to sell, or simply want to gauge your company's worth.
Cracking the Workplace Discipline Code Main.pptxWorkforce Group
Cultivating and maintaining discipline within teams is a critical differentiator for successful organisations.
Forward-thinking leaders and business managers understand the impact that discipline has on organisational success. A disciplined workforce operates with clarity, focus, and a shared understanding of expectations, ultimately driving better results, optimising productivity, and facilitating seamless collaboration.
Although discipline is not a one-size-fits-all approach, it can help create a work environment that encourages personal growth and accountability rather than solely relying on punitive measures.
In this deck, you will learn the significance of workplace discipline for organisational success. You’ll also learn
• Four (4) workplace discipline methods you should consider
• The best and most practical approach to implementing workplace discipline.
• Three (3) key tips to maintain a disciplined workplace.
LA HUG - Video Testimonials with Chynna Morgan - June 2024Lital Barkan
Have you ever heard that user-generated content or video testimonials can take your brand to the next level? We will explore how you can effectively use video testimonials to leverage and boost your sales, content strategy, and increase your CRM data.🤯
We will dig deeper into:
1. How to capture video testimonials that convert from your audience 🎥
2. How to leverage your testimonials to boost your sales 💲
3. How you can capture more CRM data to understand your audience better through video testimonials. 📊
B2B payments are rapidly changing. Find out the 5 key questions you need to be asking yourself to be sure you are mastering B2B payments today. Learn more at www.BlueSnap.com.
Implicitly or explicitly all competing businesses employ a strategy to select a mix
of marketing resources. Formulating such competitive strategies fundamentally
involves recognizing relationships between elements of the marketing mix (e.g.,
price and product quality), as well as assessing competitive and market conditions
(i.e., industry structure in the language of economics).
The world of search engine optimization (SEO) is buzzing with discussions after Google confirmed that around 2,500 leaked internal documents related to its Search feature are indeed authentic. The revelation has sparked significant concerns within the SEO community. The leaked documents were initially reported by SEO experts Rand Fishkin and Mike King, igniting widespread analysis and discourse. For More Info:- https://news.arihantwebtech.com/search-disrupted-googles-leaked-documents-rock-the-seo-world/
Digital Transformation and IT Strategy Toolkit and TemplatesAurelien Domont, MBA
This Digital Transformation and IT Strategy Toolkit was created by ex-McKinsey, Deloitte and BCG Management Consultants, after more than 5,000 hours of work. It is considered the world's best & most comprehensive Digital Transformation and IT Strategy Toolkit. It includes all the Frameworks, Best Practices & Templates required to successfully undertake the Digital Transformation of your organization and define a robust IT Strategy.
Editable Toolkit to help you reuse our content: 700 Powerpoint slides | 35 Excel sheets | 84 minutes of Video training
This PowerPoint presentation is only a small preview of our Toolkits. For more details, visit www.domontconsulting.com
Personal Brand Statement:
As an Army veteran dedicated to lifelong learning, I bring a disciplined, strategic mindset to my pursuits. I am constantly expanding my knowledge to innovate and lead effectively. My journey is driven by a commitment to excellence, and to make a meaningful impact in the world.
[Note: This is a partial preview. To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
Sustainability has become an increasingly critical topic as the world recognizes the need to protect our planet and its resources for future generations. Sustainability means meeting our current needs without compromising the ability of future generations to meet theirs. It involves long-term planning and consideration of the consequences of our actions. The goal is to create strategies that ensure the long-term viability of People, Planet, and Profit.
Leading companies such as Nike, Toyota, and Siemens are prioritizing sustainable innovation in their business models, setting an example for others to follow. In this Sustainability training presentation, you will learn key concepts, principles, and practices of sustainability applicable across industries. This training aims to create awareness and educate employees, senior executives, consultants, and other key stakeholders, including investors, policymakers, and supply chain partners, on the importance and implementation of sustainability.
LEARNING OBJECTIVES
1. Develop a comprehensive understanding of the fundamental principles and concepts that form the foundation of sustainability within corporate environments.
2. Explore the sustainability implementation model, focusing on effective measures and reporting strategies to track and communicate sustainability efforts.
3. Identify and define best practices and critical success factors essential for achieving sustainability goals within organizations.
CONTENTS
1. Introduction and Key Concepts of Sustainability
2. Principles and Practices of Sustainability
3. Measures and Reporting in Sustainability
4. Sustainability Implementation & Best Practices
To download the complete presentation, visit: https://www.oeconsulting.com.sg/training-presentations
Affordable Stationery Printing Services in Jaipur | Navpack n PrintNavpack & Print
Looking for professional printing services in Jaipur? Navpack n Print offers high-quality and affordable stationery printing for all your business needs. Stand out with custom stationery designs and fast turnaround times. Contact us today for a quote!
Improving profitability for small businessBen Wann
In this comprehensive presentation, we will explore strategies and practical tips for enhancing profitability in small businesses. Tailored to meet the unique challenges faced by small enterprises, this session covers various aspects that directly impact the bottom line. Attendees will learn how to optimize operational efficiency, manage expenses, and increase revenue through innovative marketing and customer engagement techniques.
Premium MEAN Stack Development Solutions for Modern BusinessesSynapseIndia
Stay ahead of the curve with our premium MEAN Stack Development Solutions. Our expert developers utilize MongoDB, Express.js, AngularJS, and Node.js to create modern and responsive web applications. Trust us for cutting-edge solutions that drive your business growth and success.
Know more: https://www.synapseindia.com/technology/mean-stack-development-company.html
2. Before we start our journey a bit about a bit, a byte and lots of bytes.
• A bit (b) is short for binary digit, after binary code (1 or 0) computers use to store and process data.
• Binary means base of 2 just like decimal means the base of 10.
• Byte (B) is the basic unit of computing used to create an English letter or number in computer code. One Byte is
equal to 8 bits
Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Yottabyte
Unit Bit (b) Byte (B)
(KB) (MB) (GB) (TB) (PB) (EB) (ZB) (YB)
1,000 bytes 1,000 KB 1,000 MB 1,000 GB 1,000 TB 1,000 PB 1,000 EB 1,000 ZB
Size 1 or 0 8 bits
210 bytes 220 bytes 230 bytes 240 bytes 250 bytes 260 bytes 270 bytes 280 bytes
• One page of typed text is roughly 2KB.
• All books catalogued in the US Library of Congress total around 15 TBs.
• Google processes about 1PB every hour.
• Monthly internet data flows at around 21 EBs.
• Total amount of information in existence is around 1.2 ZB.
• YB is currently too big to imagine (as per The Economist).
• International Bureau of Weights and Measures sets the name of the prefixes.
2
3. A perfect storm of forces is conspiring to generate a lot of data.
Data storage costs are falling… …data creating devices are growing…
# of hosts
$/TB
Time Time
…data processing costs are falling… …connectivity is growing…
Large volume
of data of rich
Degree variety at
“Big Data”
$/GFLOPS of
connectivity various
speeds
Time Time
…data moving costs are falling…while… …along with performance expectations.
Speed of response
$/Mbps
Time Time
3
Please note that the slope of the various lines are different but they are directionally correct.
4. Almost everything is instrumented which means data is being generated in
various formats at various speeds and in various volumes.
• Structured data (tables, records)
• Semi-structured data (XML and
similar standards)
• Complex data (hierarchical or
legacy sources)
• Event data (messages)
• Unstructured data (human Volume
language, audio, video)
• Social media data
(blogs, tweets, social networks)
• Web logs and click streams
• Spatial data (long/lat, GPS)
• Machine generated data Velocity Variety
(sensors, RFID, devices, server
logs)
• Scientific data
(genomes, proteinomics, astronom
y)
4
5. Now all this data is pure cost unless it is transformed into information from
which insights can be drawn and right action taken to create or protect value.
• The information value chain depicts the various stages in the journey of data from its creation to use:
Data Information Insights Decisions Action Value
• At each stage of the value chain the right mix of business processes, human skills and technology capabilities are
needed.
• Relational database management systems (RDBMS) date back to the early 70s. RDBMS have worked well to
handle transactional and structured data because this type of data can be stored in table format with relationships
between and amongst the tables. The technology to manage RDBMS was developed at IBM (in San Jose) and
was initially called SEQUEL (Structured English Query Language). Now called SQL
• As more of the data generated shifts from structured to other formats the traditional methods of managing data are
not practical.
• So here is what has happened in the management of data over time.
– Vertical scaling…bigger RDBMS machines…more disk space, more horse power, big data centers.
– New methods, called Horizontal scaling, arrived as vertical scaling reached its limit from a data volume
standpoint…so came Massively Parallel Processing (MPP) machines
– But then came unstructured data (variety) and streaming data (velocity) so what was needed was a whole
new way to manage data…Big Data (BD)
5
6. How do RDBMs really work (for the most part).
• Multiple interfaces
• Slow…disk drives need time to read-and-write
• Sequential
• Indexing a big challenge
• Schema is not flexible
Data is generated Data is Data is analyzed
Data is stored in Information is
in multiple aggregated in in analytical
databases reported
channels data warehouses applications
• So the solution is to remove all these boxes (no pun intended) and get analytics as close as possible to the data.
Hence, you hear terms like in-database analytics (analytics moving into d/b) or in-memory analytics (d/b moving
into memory)
Data is generated
Data is stored, aggregated and analyzed on a single Information is
via multiple
platform reported
channels
6
7. RDBMs cannot scale because their intrinsic constraints run up against a
humbling rule that you cannot have everything in life and you have to chose.
• RDBMS rely on the ACID principle
– Atomicity: All or nothing
– Consistent: All transactions take d/b from one state to another without impairing referential integrity
– Isolation: Other operations cannot access data while transaction is midstream
– Durability: Ability to recover from system failure
• Vertically scaled RDBMs do honor the ACID principle but horizontally scaled RDBMs (MPP machines) do not. This
is called the CAP Theorem. It says that you can have any two of the following three when you have a distributed
RDBM system
– Consistency which means you operate fully or not at all.
– Availability which means a node failure does not prevent surviving nodes from completing the task.
– Partition tolerance (the distributed part) which means that system continues to operate despite arbitrary
message loss.
• The two bullets above mean that as you scale RDBM system you run into a wall…actually a cap!
7
8. Therefore, RDBMS are not good at performing all types of analysis.
• We need scalable database models that are not dependent on a fixed data schemas.
App App App App App
Need for a
new data
architecture
App
Db Db Db Db Db Db Db
App
Db
Db
Vertical scaling Horizontal scaling Schema agnostic
scaling
Volume growth
Velocity growth
Variety growth
8
9. The rich variety of data intruded to make data management a painnus posteriorus*.
• While the volume and velocity of the data is Volume vector…..bad
growing rapidly it is the growing variety of data Velocity vector…badder
that is a complexity multiplier in the
management of all these bits.
• RDBMS and MPP approaches exhausted the
ability of current architectures to process the
torrent of bits flowing.
• Hence arrived what I call Big Data
Architecture (BDA)
• BDA does not replace existing investments in
data management; BDA complements them
so no need to rip-and-replace; it is more Variety vector…baddest
insert-and-augment.
• BDA started in companies that had BD,
essentially internet companies like Yahoo,
Google, Facebook, Amazon, Twitter, LinkedIn
that needed web-scale solutions to their data
problems. They built this from scratch
because there was nothing commercially
available.
• This revolution was called NOSQL (Not Only SQL)
• The “NO” means that it is a technology that works in addition to SQL not instead of it.
• NOSQL databases were organically developed…these are essentially schema agnostic…meaning that some of
the constraints of SQL databases are negotiated well.
*: painnus posteriorus is a contemporary acute discomfort of lower thoracic induced by unrelenting bit storms
9
10. NOSQL solves the complexity, volume and speed constraints of an SQL design
by using four different data models.
• Key value stores is a schema less model of storing data
• Big table clones is a compressed high performance database system based on Google File System.
• Document databases is a method to store semi-structured data
• Graph databases uses graph structures (nodes, edges etc.) that provides index free lookups.
NOSQL model
Document
Key value stores Big table clones Graph databases
databases
Based on Based on Based on Based on
Amazon Dynamo Google BigTable Amazon Dynamo Graph Theory
Memcached Hbase Lotus Domino AllegroGraph
Dynamo Cassandra CouchDB VertexDB
Voldemort HyperTable MongoDB Neo4J
Tokyo Cabinet AzureTS Riak Active RDF
10
11. BDA is actually very effective.
• Yahoo tested BDA by calculating Pi to 2,000,000,000,000,000th digit
• It used 1,000 computers and the calculation took 23 days. This means 23,000 computing days.
• Using RDBMs, it would have taken on PC about 500 years which is essentially ~182,621 computing days. Now
that is ~87% improvement in speed (using a very rough back of the envelope calculation)
• So yes, BDA works.
11
12. BDA works by breaking a problem into pieces, analyze each piece separately
and then aggregating the results into a single response.
• HADOOP is an instance of NOSQL that has two main parts: MapReduce and HDFS
• MapReduce means mapping a problem to worker nodes and then aggregating (reducing) the results
• HDFS is the file management systems that makes MapReduce work
Map phase Reduce phase
• Google searches
• Amazon recommendations
Piece 1
• Paypal real time fraud detection
• Credit card unauthorized charges Piece 2
• Loopt
Worker nodes
Master node
Master node
• Directions from office to bar/pub…nearest Piece 3
vs. cheapest Problem Result
• Genomics searching (needle-in-a-haystack) Piece 4
• Zynga gaming
…
• Facebook Friends
• LinkedIn People-you-may-know (PYMK) Piece n
• GPS directions (as you drive)
• …
12
13. What does BDA landscape look like?
• It depends on what the need is but here is a simple graphic that shows the various elements. This is only
illustrative.
Data
Visualization/Mobile/R
presentation
Displaying and monitoring logs: Chukwa
Job tracker
Data processing Hadoop (batch); S4, Storm (streaming)
Coordination: Zookeeper
Data query Pig, Hive
Processing
Azkaban, Oozie
scheduler
Task tracker
Database Voldemort, Cassandra, HBase
Data collection Kafka, Flume, Scribe
13
14. BDA architecture does not mean you need to throw away your investments in
traditional data analytics infrastructure.
• BDA works alongside existing investments made by companies…not rip-and-replace!
Traditional BI infrastructure
Reporting
&
Distribution
BDA
14
15. Even NOSQL is getting challenged, but for now we got-to-dance-with-them-
what-brung-you.
• Zynga needs additional 1,000 servers every week for their data needs.
• Every search string you send to Google is divided and sent to 700-1000 servers so that you can get your response
back in micro-seconds and thus not waste a few seconds in which you could have destroyed civilization.
• Youtube serves 1 billion videos every day.
• 2.5 billion photos uploaded each month to Facebook.
• ~150,000 zombie computers created every day (used in botnets for sending spam)
• At beginning of 2009 there were 187 million web sites. At the end of 2009 there were 234 million web sites. 25%
growth.
15
16. And what is next.
Big Data + Context + Interactivity =
16
20. New skills you should consider in the world of Big Data
– Cultivate expertise but be a strong generalist
– Develop and grow relationships and networks
– Develop communication skills
– Refine presentation skills
– Read up, a lot
– Monitor competition
– Understand business, I mean really understand it
Embrace*
– Love the edge
– Step outside your comfort zone, frequently
ambiguity
– If you have the appetite, read up a book or two on statistics
– Think laterally, this just means do not be afraid to connect the dots
* At a minimum, learn to accept ambiguity
20