This document provides an overview of big data and discusses key concepts. It begins by defining big data and noting the increasing volume, velocity and variety of data being created. It then covers the big data landscape including storage models and technologies like Hadoop, analytics techniques like machine learning, and visualization. Finally, it discusses business uses cases and how big data is impacting industries and creating new business models through insights gained from data.
From Business Intelligence to Big Data - hack/reduce Dec 2014Adam Ferrari
Talk given on Dec. 3, 2014 at MIT, sponsored by Hack/Reduce. This talk looks at the history of Business Intelligence from first generation OLAP tools through modern Data Discovery and visualization tools. And looking forward, what can we learn from that evolution as numerous new tools and architectures for analytics emerge in the Big Data era.
From Business Intelligence to Big Data - hack/reduce Dec 2014Adam Ferrari
Talk given on Dec. 3, 2014 at MIT, sponsored by Hack/Reduce. This talk looks at the history of Business Intelligence from first generation OLAP tools through modern Data Discovery and visualization tools. And looking forward, what can we learn from that evolution as numerous new tools and architectures for analytics emerge in the Big Data era.
Big data, Machine learning and the AuditorBharath Rao
Check an insight as to how an Auditor can leverage Analytics, machine learning, and Technology to achieve absolute assurance and to effectively control the Fraud Risk present in the Enterprise.
every business needs a data analytics to get a detailed value of cost and profits. we will study the importance in detail in this particular presentation.
About
Evolution of Data, Data Science , Business Analytics, Applications, AI, ML, DL, Data science – Relationship, Tools for Data Science, Life cycle of data science with case study,
Algorithms for Data Science, Data Science Research Areas,
Future of Data Science.
Big Data Analytics and a Chartered AccountantBharath Rao
Big Data Analytics is a growing field and currently being capitalized by many businesses. Businesses leverage on Big Data to gain a keen understanding of the Consumer Behavior and Market Understanding. Additionally Big Data can be used different fields such as Financial Audit, Control Assurance and Forensics.
This presentation is made to provide an insight regarding what opportunities reside for a Chartered Accountant in order to provide suitable value creation with regards to Big Data Analytics.
This presentation was made during my GMCS 2 Course at Mangalore branch of SIRC of ICAI and hence has limited number of slides.
This was first part of the presentation on "Road Map for Careers in Big Data" in Conjunction with Hortonworks/Aengus Rooney on 17th August 2016 in London. For those contemplating moving to Big Data from often Relational Background
Business intelligence data analytics-visualizationMuthu Natarajan
Business Intelligence, Cloud Computing, Data Analytics, Data Scrubbing, Data Mining, Big Data & Intelligence, How to use Data into Information, Decision Based, Methods for Business Intelligence, Advanced Analytics, OLAP, Multidimensional Data, Data Visualization.
In today’s competitive world, every business has to fight huge competition to achieve success. So it is necessary for every business organization to collect large amount of information like employee’s data, Sales data, customer’s information, market analysis reports, etc.
You can view the full presentation of this webinar here: http://info.datameer.com/Slideshare-Fighting-Fraud-this-Holiday-Season.html
In 2012, retailers lost $3.5 billion in revenue to online fraud. These losses spike by a substantial estimated 20% during the holiday season.
Join Datameer and Hortonworks in this webinar to learn how Big Data Analytics can be used to identify new fraud schemes during peak fraud season.
In this webinar, you will learn about:
current challenges in identifying fraud
what to look for in a big data solution addressing fraud
how big data analytics can identify credit card fraud
best practices
Analytics, machine e deep learning, data/event streaming
Big data streaming: abilitare la macchina del tempo
Real time event streaming e nuovi paradigmi concettuali:
- Transazioni distribuite
- Consistenza eventuale
- Proiezioni materializzate
Real time event streaming e nuovi paradigmi architetturali:
- Enterprise service bus
- Event store
- Database delle proiezioni
Cenni di Domain Driven Design: una visione strategica della modellazione del proprio dominio di business nell'era dei bi Data.
seminar on Big Data Technology
report on big data technology
webinar on big data technology
topic on big data technology
ppt presentation on big data technology
Product-thinking is making a big impact in the data world with the rise of Data Products, Data Product Managers, data mesh, and treating “Data as a Product.” But Honest, No-BS: What is a Data Product? And what key questions should we ask ourselves while developing them? Tim Gasper (VP of Product, data.world), will walk through the Data Product ABCs as a way to make treating data as a product way simpler: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge.
Big data, Machine learning and the AuditorBharath Rao
Check an insight as to how an Auditor can leverage Analytics, machine learning, and Technology to achieve absolute assurance and to effectively control the Fraud Risk present in the Enterprise.
every business needs a data analytics to get a detailed value of cost and profits. we will study the importance in detail in this particular presentation.
About
Evolution of Data, Data Science , Business Analytics, Applications, AI, ML, DL, Data science – Relationship, Tools for Data Science, Life cycle of data science with case study,
Algorithms for Data Science, Data Science Research Areas,
Future of Data Science.
Big Data Analytics and a Chartered AccountantBharath Rao
Big Data Analytics is a growing field and currently being capitalized by many businesses. Businesses leverage on Big Data to gain a keen understanding of the Consumer Behavior and Market Understanding. Additionally Big Data can be used different fields such as Financial Audit, Control Assurance and Forensics.
This presentation is made to provide an insight regarding what opportunities reside for a Chartered Accountant in order to provide suitable value creation with regards to Big Data Analytics.
This presentation was made during my GMCS 2 Course at Mangalore branch of SIRC of ICAI and hence has limited number of slides.
This was first part of the presentation on "Road Map for Careers in Big Data" in Conjunction with Hortonworks/Aengus Rooney on 17th August 2016 in London. For those contemplating moving to Big Data from often Relational Background
Business intelligence data analytics-visualizationMuthu Natarajan
Business Intelligence, Cloud Computing, Data Analytics, Data Scrubbing, Data Mining, Big Data & Intelligence, How to use Data into Information, Decision Based, Methods for Business Intelligence, Advanced Analytics, OLAP, Multidimensional Data, Data Visualization.
In today’s competitive world, every business has to fight huge competition to achieve success. So it is necessary for every business organization to collect large amount of information like employee’s data, Sales data, customer’s information, market analysis reports, etc.
You can view the full presentation of this webinar here: http://info.datameer.com/Slideshare-Fighting-Fraud-this-Holiday-Season.html
In 2012, retailers lost $3.5 billion in revenue to online fraud. These losses spike by a substantial estimated 20% during the holiday season.
Join Datameer and Hortonworks in this webinar to learn how Big Data Analytics can be used to identify new fraud schemes during peak fraud season.
In this webinar, you will learn about:
current challenges in identifying fraud
what to look for in a big data solution addressing fraud
how big data analytics can identify credit card fraud
best practices
Analytics, machine e deep learning, data/event streaming
Big data streaming: abilitare la macchina del tempo
Real time event streaming e nuovi paradigmi concettuali:
- Transazioni distribuite
- Consistenza eventuale
- Proiezioni materializzate
Real time event streaming e nuovi paradigmi architetturali:
- Enterprise service bus
- Event store
- Database delle proiezioni
Cenni di Domain Driven Design: una visione strategica della modellazione del proprio dominio di business nell'era dei bi Data.
seminar on Big Data Technology
report on big data technology
webinar on big data technology
topic on big data technology
ppt presentation on big data technology
Product-thinking is making a big impact in the data world with the rise of Data Products, Data Product Managers, data mesh, and treating “Data as a Product.” But Honest, No-BS: What is a Data Product? And what key questions should we ask ourselves while developing them? Tim Gasper (VP of Product, data.world), will walk through the Data Product ABCs as a way to make treating data as a product way simpler: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge.
Data and Analytics Career Paths, Presented at IEEE LYC'19.
About Speaker:
Ahmed Amr is a Data/Analytics Engineer at Rubikal, where he leads, develops, and creates daily data/analytics operations, which includes data ingestion , data streaming, data warehousing, and analytical dashboards. Ahmed is graduated from Computer Engineering Department, Alexandria University; and he is currently pursuing his MSc degree in Computer Science, AAST. Professionally, Ahmed worked with Egyptian/US startups such as (Badr, Incorta, WhoKnows) to develop their data/analytics projects. Academically, Ahmed worked as a Teaching Assistant in CS department, AAST. Ahmed helps software companies to develop robust data engineering infrastructure, and powerful analytical insights.
References:
1) https://www.datacamp.com/community/tutorials/data-science-industry-infographic
2) Analytics: The real-world use of big data, IBM, Executive Report
Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
Big Data kennen sehr viele IT-Experten, wenigstens haben Sie eine Vorstellung davon. In der Praxis arbeiten damit in Deutschland derzeit nur wenige. Dabei bringt Big Data ein ganz neues Momentum in moderne Softwarelösungen und ist im Kontext der Mobil-, Cloud- und Social-Veränderungen nicht wegzudenken. Big Data macht Software intelligent und damit auf eine ganz neue Art für die Benutzer erlebbar. Mit Big Data entstehen neue Softwarearchitekturen, weil Informationen völlig anders verarbeitet werden - nämlich schneller, differenzierter und oft mit dem Ziel, Schlüsse zu ziehen und Vorhersagen zu treffen.
In diesem Vortrag wird erläutert, wie moderne Softwarearchitekturen gestaltet werden, sodass Sie Big Data Paradigmen erfolgreich umsetzen und welche Vorteile sich für die zunehmend mobilen Softwarelösungen ergeben. Wir werfen zudem einen Blick auf die Potentiale und Optionen in Branchen wie Banken, Versicherung oder Handel.
Analytics, machine e deep learning, data/event streaming
- Big data streaming: abilitare la macchina del tempo
- Real time event streaming e nuovi paradigmi concettuali: transazioni distribuite, consistenza eventuale, proiezioni materializzate
- Real time event streaming e nuovi paradigmi architetturali: Enterprise service bus, Event store, Database delle proiezioni
- Cenni di Domain Driven Design: una visione strategica della modellazione del proprio dominio di business nell'era dei Big Data
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...Edgar Alejandro Villegas
Presentation slides of:
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 2013 - PDF
Scott Mackenzie - Sr. Director, Platform & Analytics CoE
Michael Golzc - CIO for SAP Americas
Ken Demma - VP, Insight Driven Marketing
20 Aug 2013 - Webcast - http://goo.gl/T74WAL
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
Watch full webinar here: https://bit.ly/3mJJ4w9
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
Watch full webinar here: https://bit.ly/3offv7G
Presented at AI Live APAC
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this on-demand session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc.
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
Gartner, IBM, Accenture and many others have asserted that 80% or more of the world’s information is unstructured – and inherently hard to analyze. What does that mean? And what is required to extract insight from unstructured data?
Unstructured data is infinitely variable in quality and format, because it is produced by humans who can be fastidious, unpredictable, ill-informed, or even cynical, but always unique, not standard in any way. Recent advances in natural language processing provides the notion that unstructured content can be included in data analysis.
Serious growth and value companies are committed to data. The exponential growth of Big Data has posed major challenges in data governance and data analysis. Good data governance is pivotal for business growth.
Therefore, it is of paramount importance to slice and dice Big Data that addresses data governance and data analysis issues. In order to support high quality business decision making, it is important to fully harness the potential of Big Data by implementing proper Data Migration, Data Ingestion, Data Management, Data Analysis, Data Visualization and Data Virtualization tools.
Check it out: https://www.experfy.com/training/courses/march-towards-big-data-big-data-implementation-migration-ingestion-management-visualization
A technical Introduction to Big Data AnalyticsPethuru Raj PhD
This presentation gives the details about the sources for big data, the value of big data, what to do with big data, the platforms, the infrastructures and the architectures for big data analytics
Business intelligence- Components, Tools, Need and Applicationsraj
As part of the research project for the course Technical Foundations of Information Systems at the University of Illinois, our team worked on the topic, Business Intelligence. The presentation focuses on what is Business Intelligence, its various components, latest tools, the need of BI as well as applications of this technology. This project deals with the latest development of BI technologies (hardware or software) and includes comprehensive literature survey from Journals, and the Internet.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Course objectives
● Give you a map / big picture and pointers to be
able to drill down as you need
● Will cover business side but will also cover
technology as without good technical
understanding; it is not possible to grasp
business side
● Will go over landscape and possibilities and
illustrate with a good number of use cases
3. Proposed Agenda
● What is Big Data?
● Big Data landscape (Tech heavy)
● Business / Use cases
● Discussion
4. Proposed Agenda
● What is Big Data?
● Big Data landscape
● Business / Use cases
● Discussion
6. Data and Big Data
● Data is the basis for Information
Economics are now allowing to store virtually
unlimited data
● "“Big data” is high -volume, -velocity and -variety
information assets that demand cost-effective,
innovative forms of information processing for
enhanced insight and decision making."
Gartner's definition.
http://www.youtube.com/watch?v=ah14LEFKe8Q
Year Cost of 1GB
1980 $3,000,000
1990 $8000
2000 $30
2010 $0.08
7. Data – Information processing 1/2
● Through processing data becomes
Information (knowledge) and knowledge
creates insight and insight = success.
● Transaction processing:
A sequence of information exchange and related work
that is treated as a unit for the purposes of satisfying a
request (usually human but not exclusively)
aka Online Transaction processing or OLTP
Example: you buy an item on Amazon:
. Item is placed on hold in Inventory system
. Item is placed in shopping cart
. System requests CC payment authorization for item
. If payment is approved, CC is charged, item is removed from
inventory and shipped.
-> all of the above or nothing (roll back)
8. Data – Information processing 2/2
● Real Time processing
Perceived as "immediate" from the originator
Ex: trading, payment, online booking, "right" ad
delivery, gaming, etc.
● Batch processing:
Delayed Execution of a series of programs ("jobs") on
a computer without manual intervention.
Ex: billing, virus scanning, web indexing, data mining,
analytics, etc.
9. Data – ACID Transaction
● Technical definition:
● Atomicity: each transation is all or nothing
● Consistency: transaction will stay consistent
with data rules
● Isolation: Ensures that each transaction is
kept isolated from others
● Durability: Once a transaction has been
committed, it will remain so, even in the event
of power loss, crashes, or errors
10. Big Data - Applications
● Find deeper insight in data:
customers, partners and business.
All Industries will be affected.
"The software is eating the world"
● Retail: buying patterns, store traffic, etc
● Logistics: track and optimize shipments, etc
● Healthcare: preventive medecine, disease
management, etc.
● Social media: optimize usage, ads, etc.
● Finance: buying patterns, portfolio optimization
http://www.youtube.com/watch?v=7D1CQ_LOizA
http://online.wsj.com/article/SB10001424053111903480904576512250915629460.html
11. Big Data – Three dimensions
● Volume
● Amount of data
● Velocity
● Speed at which it arrives
● Variety
● Types of data
12. Big Data – Volume/Size matters
Name Value Example
kilobyte (kB) 10^3 Email (7KB), Images, web pages
megabyte (MB) 10^6 Ebooks, MP3, SD video etc.
gigabyte (GB) 10^9 HD movie
terabyte (TB) 10^12 For a single journey across the Atlantic
Ocean, a four-engine jumbo jet can
create 640 terabytes of data
petabyte (PB) 10^15 FB has over 1.5 PB of stored photos
exabyte (EB) 10^18 Seagate Technology reported selling 330
exabytes worth of hard drives during the
2011 Fiscal Year
zettabyte (ZB) 10^21 WW production and consumption of data.
According to International Data
Corporation, the total amount of global
data is expected to grow to 2.7 zettabytes
during 2012
yottabyte (YB) 10^24 Not there yet ..
...
http://en.wikipedia.org/wiki/Zettabyte http://www.youtube.com/watch?v=CsVYID9rMGE
13. Big Data - Speed
● How fast is new data coming?
● How does this data need to be used or
correlated?
● How long is data valuable?
● How fast does data need to be
processed?
● This dimension in particular will affect the
system architecture
14. Big Data - Variety
● What type(s) / format(s)?
● Human or machine generated
● Text, location, document, picture, video, click streams,
log file, event, etc.
● Is it structured or unstructured?
● Static vs dynamic
● What are relationships/dependencies
within data elements?
15. Proposed Agenda
● What is Big Data?
● Big Data landscape
● Business / Use cases
● Discussion
16. Big Data landscape
Big Data applications are roughly built using
three technology layers:
● Storage
● Analytics
● Visualization
19. Big Data – Storage
● Main logical data models:
● Tabular (represented by rows and
columns) - Relational model
● Tree (a set of nodes with parent-children
relationship)
● Graph structure (a set of interconnected
nodes)
● Document (free structure /
unstructured / schema less)
20. Big Data – Storage
● Physical data models:
● Relational Data Base Mananement Systems (RDBMS)
support ACIDity and joins are considered relational. Use
SQL language as API.
● Key-value systems basically support get, put, and delete
operations based on a primary key.
● Column-oriented systems still use tables but have no joins
(joins must be handled within your application). Obviously,
they store data by column as opposed to traditional row-
oriented databases. This makes aggregations much easier.
● Document-oriented systems store structured "documents"
such as JSON or XML but have no joins (joins must be
handled within your application). It's very easy to map data
from object-oriented software to these systems.
http://nosql-database.org/
21. Big Data - Storage
● Not practical to store data on 1 system,
but distributing data creates complexity:
● Consistency: means that each client always has the
same view of the data.
● Availability: means that all clients can always read
and write.
● Partition: tolerance means that the system works
well across physical network partitions.
● If system is partitioned, it is only possible
to achieve 2 out of 3 properties (known as
CAP theorem): CA, AP or CP.
22. Big data - Storage
Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
25. Big Data – Analytics
● Process of examining large amounts of
data of a variety of types to uncover
hidden patterns, unknown correlations and
other useful information resulting in
business benefits, such as more effective
marketing or increased revenue.
● Can work on all forms of data as
described before
● Can involve Transactions, Real Time
and/or Batch Oriented
26. Big Data – Analytics
● "Stages" of analytics:
● Business monitoring: traditional BI,
Charting, Key Performance Indicator,
etc.
● Business insights: uses statistics, data
mining, predictive analysis to generate
actionable insights: "Intelligent
dashboards". Leverages trending,
classification, optimization, simulations.
● Business transformation based on data
28. Big Data – Analytics
● Traditional predictive analytics and data
mining are designed for relational data or
structured data so a whole set of new
technologies have evolved for
unstructured data.
● Hadoop (batch oriented): "brute force"
● Real Time processing (new trend):
optimized for specific use cases
● Machine learning: data intensive
29. Big Data – Hadoop
● Designed for large scale (100's of
terabytes of data) batch oriented
information processing: archiving,
transformation, exploration, etc.
● Reliable while using commodity HW and
open source
● Main components:
● Distributed File System (HDFS)
● Map Reduce: distributed data processing
● Associated infrastructure components, query
mechanisms and machine learning
30. Big Data – Hadoop Example
● Derive meaning from logs:
● Who is using the web site?
IP, location, device, etc.
● What pages are they looking at?
How long, how often?
● Are they buying?
Adding products to cart?
Checking out?
● What are the trends?
31. Big Data – Real-Time
● Goal is to process data from highly
dynamic sources in real time
● Data is typically streaming to the
processing system and stored / processed
directly into memory
● Complex Event Processing has been
there for years but need new architecture
for Big Data scale and distributed
processing: Storm/Kafta are one of the
frameworks that could become "Hadoop"
of Real-Time
32. Big Data – Real-Time Example
● Derive meaning from tweets:
● How well brand is trending?
● By time, category?
● Compared to competitors
● Sentiment?
● etc
http://www.filtize.com/
33. Big Data – Machine learning
● What is Machine learning?
34. Big Data – Machine learning
● "A branch of artificial intelligence, that is
about the construction and study of
systems that can learn from data."
Supports Predictive Analytics
● Can perform tasks that are too difficult to
specify algorithmically
● Example of applications:
● Computer vision, Natural language processing,
Fraud detection, Game playing, Robot locomotion,
Sentiment analysis, Adaptive systems, scientific
applications, anomaly detection, recommendation
engine, personal assistant, etc
35. Big Data – Example
● Handwritten recognition
● Handcrafted rules will result in large
number of rules and exceptions. Best to
have a machine that learns from a large
training set.
36. Big Data – Example
● Computer vision: car detection
● First Learning
● Then Testing: Is this a car?
Not a carCars
37. Big Data – Machine learning
● Supervised or unsupervised learning:
whether we train the model or the system
learns on its own
● Types of information processing:
● Supervised
– Classification (discrete)
– Regression (continuous)
● Unsupervised
– Clustering (discrete)
38. Big Data – Machine learning
Supervised – Classification / Regression
● First teach the model
● Then verify against the model
39. Big Data – Machine learning
Classification
● Classifier (single or multi class): given some set of
features with corresponding labels, learn a function to
predict the labels from the features
x x
x
x
x
x
x
x
o
o
o
o
o
x2
x1
40. Big Data – Machine learning
Classification
Many algorithms to choose from:
● SVM
● Neural networks
● Naïve Bayes
● Bayesian network
● Logistic regression
● Randomized Forests
● Boosted Decision Trees
● K-nearest neighbor
● RBMs
● Etc.
41. Big Data – Machine learning
Regression
● Regression allows to fit an equation to a dataset to be
able to predict values for new data
Example: calculate price of a house: in reality much more than
1 variable: size, number of floors, # of rooms, age, location, etc
42. Big Data – Machine learning
Clustering
● Clustering allows to place data elements into related
groups without advance knowledge of the group
definitions.
● Example: social network aka similar profiles
● K-means is a popular algorithm for clustering
http://en.wikipedia.org/wiki/K-means_clustering
43. Big Data – Machine learning
● Predictive analytics techniques usage
44. Big Data – Machine learning
● Designing a high accuracy learning system
“It’s not who has the best algorithm that wins.
It’s who has the most data.”
Ex: Classify between confusable words.
{to, two, too}, {then, than}
For breakfast I ate _____ eggs.
●
Algorithms
●
Perceptron (Logistic regression)
●
Winnow
●
Memory-based
●
Naïve Bayes
Training set size (millions)
Accuracy
46. Big Data – Visualization
● Help overcome information overload
● Allows to see patterns and connections:
instantly and overtime
● Focus on specific parts of data but also in
relation to other parts: data is relative
● Many different tools and techniques can
be used based on data sets
http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html
http://www.ted.com/talks/joann_kuchera_morin_tours_the_allosphere.html
47. Big Data – Visualization
● Many differents types available:
● 1D, 2D, 3D
● Temporal: timeline, time series, etc
● Advanced types: cloud tag, bubble
chart, network graph, rose chart, ,
spider chart, heatmap, tree map,
dependency graph, etc.
● Can allow interactivity (navigate, zoom
in/out, slide and dice, etc).
http://guides.library.duke.edu/vis_types
48. Big Data – Visualization Examples
https://developers.google.com/maps/tutorials/visualizing/earthquakes
http://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-visualization/
49. Proposed Agenda
● What is Big Data?
● Big Data landscape
● Business / Use cases
● Discussion
50. Big Data – A Word on Privacy
● Currently mostly ignored: Big Brother?
● Everything is being stored (data retention)
– Location, calls, SMS, searches, web access,
transactions, applications used, contacts, calendar, etc.
● Data doesn't belong to you (Facebook, etc)
and may be resold (based on privacy policy)
● Apps can read your calendar, contacts, etc.
and upload data on their server
● For now users do not seem to care:
they care about service and free (as in $ ).
Your phone company is watching
Google's drive privacy article
Who's afraid of the bad, big data?
51. Big Data – And Social Media
An opportunity to:
● Identify trends: tweets, likes, blogs, page
views, etc
● Pinpoint problems: social media data can be
used to get sentiment / feedback on products
/ brands / events (even real-time)
● Predict behavior: what is trend over time and
how does it correlate to particular events?
52. Big Data – Not just 1 device
http://www.smartinsights.com/mobile-marketing/mobile-marketing-analytics/mobile-marketing-statistics/
53. Big Data – Mobile is growing faster
http://www.smartinsights.com/mobile-marketing/mobile-marketing-analytics/mobile-marketing-statistics/
55. Big Data - Business models
● Data is the "new oil"
● Every day, 2.5 quintillion bytes of data are created, with 90
percent of the world's data created in the past two years alone.
● Data production will be 44 times greater in 2020 than in 2009.
56. Big Data - Business models
● Data is the new business model as:
● Cost of HW, SW and networks requires to
produce and transport data continues to
approach an effective cost of zero
● Even in the physical manufacturing world,
cost will go down: robotics, 3D printing, etc.
● Data creates insight which allows to enhance
and disrupt existing business models
57. Big Data - Business models
● Opportunities for:
● Web businesses
To increase ARPU
● Enterprises
Serve their customers better and improve management of
suppliers and partners
● IoT
Internet of Things (IoT) or M2M (Machine To Machine) for
instance will allow brand new capabilities and services
58. Big Data - Business models
● Already used by web business (Google,
Facebook,etc and moving to Enterprises)
59. Big Data - Web
● More data can derive more insight which lead to
increase ARPU
● Ex: Ad platform
Advertisers define ads and campaigns available
across web, mobile, TV, etc.
On Google properties, Google makes money each
time an ad is clicked (CPC). On Network members and
content providers, Google makes money each time an
ad is clicked or is displayed (CPM)
-> Increase relevance and knowledge on the user lead to
increased revenues
60. Big Data - Enterprises
● All Industries are being disrupted
62. Big Data - Enterprises
● Differentiation: satisfy customers, improve
existing services and create new service
offerings
● Improve processes: merchandising,
forecasting, and purchasing to distribution,
allocation, and transportation, etc.
● Data as a service: resell information,
analysis and insights
64. Big Data - IoT
More and more machine are connecting
and generating data
65. Big Data - IoT
http://harborresearch.com/wp-content/uploads/2012/05/HarborResearch-nPhase_Paper_March-2011.pdf
66. Big Data - IoT
http://www.slideshare.net/harborresearch/harbor-research-introduction-to-smart-business-m2-m
67. Big Data – IoT and Healthcare
Home Healthcare / Tele-Health
● Business and Technology trends
● Aging Population
● Increase in Chronic Illnesses
● Demand from patients for home environment and
independence
● Costs pressure and scarcity for hospital beds
● Affordable and available telecommunications
● Computing advances: cost, size, power,
performance, imaging, etc.
68. Proposed Agenda
● What is Big Data?
● Big Data landscape
● Business / Use cases
● Discussion