SlideShare a Scribd company logo
Big Data: Challenges, Practices and Technologies
NIST Big Data Public Working Group Workshop at IEEE Big Data 2014
Nancy W. Grady1
, Mark Underwood2
, Arnab Roy3
, Wo L. Chang4
1
Science Applications International Corporation, nancy.w.grady@saic.com
2
Krypton Brothers, LLC, mark.underwood@kryptonbrothers.com
3
Fujitsu Laboratories of America, aroy@us.fujitsu.com
4
National Institute of Standards and Technology, wchang@nist.gov
Abstract—Big Data has changed both technologies and
practices for building data analytics systems. A number of
working groups have been discussing the recent changes along a
number of dimensions. The NIST Big Data Public Working
Group organized a workshop to promote communication among
working groups, technologists, and practitioners to come to an
understanding of the current state of the Big Data discipline,
collaboration best practices, future directions for this emerging
specialization, and to identify security and privacy concerns.
Keywords—Big Data; reference architecture; collaboration;
security; privacy; metadata; standards
I. INTRODUCTION
NIST has been facilitating a Big Data Public Working
Group (NBD-PWG) to form a community of interest from
industry, academia, and government to promote better
understanding of this new discipline. The aim has been the
development of consensus definitions, taxonomies, reference
architectures, and technology roadmaps based on an
understanding of use cases and requirements. The goal is to
create vendor-neutral, technology and infrastructure agnostic
vocabulary and descriptions. This will enable Big Data
stakeholders to better understand the emerging discipline, and
to choose the best-suited analytics tools for their processing
and visualization requirements on the most appropriate
computing platforms and clusters. By providing a framework
for communication, needs and capabilities can be better
matched between technologists and practitioners.
To further the NBD-PWG goals, a workshop was staged at
the IEEE Big Data 2014 conference to bring together
technologists and practitioners, to understand what has
changed with Big Data, assess the current state of the art,
identify lessons learned, and surface known challenges. To
span this new discipline, four panels were organized: The State
of Big Data Technology, Big Data Future Trends, Big Data
Sharing and Collaboration, and Big Data Security and Privacy.
II. THE STATE OF BIG DATA TECHNOLOGY
The term Big Data has come to mean many things, and
communication about new approaches has been hindered by
the conflicting vocabulary and definitions. To better
understand this emerging discipline, this panel discussed
frameworks for understanding the new architectures, use cases
and requirements, and benchmarks.
A. Data Consistency Issues in Big Data Systems – Jianmin
Wang, Tsinghua Big Data Research Center
Distributed storage systems are required to guarantee data
reliability, fault-tolerance and accessibility for users. Besides
hardware configuration, the design and implementation of
distributed systems is very important to reach these goals. The
most common solution is that we store multiple copies of the
same data in different storage devices. The multiple copies are
called data replica.
We take two popular distributed storage systems as
examples to analyze the working mechanism and replica
consistency. The first one is Cassandra, propagating data in a
star model and the second one is HDFS, propagating data in a
chain model.
B. NIST Big Data Interoperability Framework – Orit Levin,
Microsoft
The National Institute of Standards and Technology (NIST)
NIST Big Data Interoperability Framework, Volume 6:
Reference Architecture is one of seven volumes in the
roadmap, whose overall aims are to define and prioritize Big
Data requirements, including interoperability, portability,
reusability, extensibility, data usage, analytic techniques, and
technology infrastructure in order to support secure and
effective adoption of Big Data. The Reference Architecture is
dedicated to developing a vendor-neutral, technology- and
infrastructure-agnostic conceptual model and examining
related issues. Created by the NIST Big Data Public Working
Group (NBD-PWG) Reference Architecture Subgroup, the
conceptual model is based on the analysis of public Big Data
material and inputs from the other NBD-PWG subgroups. The
NIST Big Data Reference Architecture (NBD-RA) is
applicable to a variety of business environments including
tightly-integrated enterprise systems, as well as loosely-
coupled vertical industries that rely on the cooperation of
independent stakeholders.
C. NIST - Use Cases and Requirements- Geoffrey Fox,
Indiana University
The NIST Big Data Public Working Group collected an
extensive catalog of use cases, reflecting Big Data applications
in public health, epidemiology, U.S. census, cargo shipping,
geointelligence, defense, genomics, recommendation engines
and media. These applications were then examined in the
2014 IEEE International Conference on Big Data
978-1-4799-5666-1/14/$31.00 ©2014 IEEE 11
context of the NIST group’s reference architecture to identify
recurring patterns thought to be specific to Big Data
applications. These patterns were further explored in light of
current Apache stack offerings. These insights will likely be
useful to prospective system designers.
D. Introducing TPCx-HS – first Industry Standard for
Benchmarking Big Data Systems – Raghunath Nambiar,
Cisco
Over the past quarter century, industry standard
benchmarks have had a significant impact on the computing
industry. Vendors use benchmark standards to illustrate
performance competitiveness for their existing products, and to
improve and monitor the performance of their products under
development. Many buyers use the results as points of
comparison when purchasing new computing systems.
Continuing on the Transaction Processing Performance
Council’s commitment to bring relevant benchmarks to
industry, the TPC announced TPCx-HS – the first standard that
provides verifiable performance, price/performance and energy
consumption metrics for Big Data systems. TPCx-HS can be
used to assess a broad range of system topologies and
implementation methodologies for Hadoop, in a technically
rigorous and directly comparable, vendor-neutral manner. And
while modeling is based on a simple application, the results are
highly relevant to Big Data hardware and software systems.
III. BIG DATA FUTURE DIRECTIONS
Is volume, velocity, variety, veracity or some other facet of
Big Data most critical for planning a particular Big Data
project? Will a given deployment, even if well considered,
find itself overtaken by a superseding technology? What are
the emerging trends and technologies to be aware of? These are
questions practitioners must entertain now as new commercial
releases are transforming the capabilities of widely used Big
Data software. The Future Directions panel considers likely
Big Data trends in hardware, computing models, analytics and
measurement.
A. InfoSymbiotics/DDDAS and the Nest Generation of Big
Data and Big Computing – Frederica Darema, Air Force
Office of Scientific Research
We describe the DDDAS (Dynamic Data Driven
Applications Systems), a new paradigm unifying systems
modeling and systems instrumentation. DDDAS can facilitate
new capabilities for advanced modeling/simulation and
intelligent exploitation of data of engineered, natural, and
societal multi-entity systems. Results may include improved
understanding, analysis, and optimized, autonomic
management and decision support of operational conditions of
these systems.
The key underlying concept in DDDAS is the dynamic
integration between data and computation, whereby
instrumentation data and executing models of systems become
a feedback control loop. On-line data are dynamically
incorporated into executing models of the system to improve
the accuracy or speedup the simulation, and in reverse the
executing model controls the instrumentation to selectively
target the data collection process to improve accuracy and
measurability.
This paradigm, unifying modeling and instrumentation, is
timely with the advent of large-scale dynamic data and large-
scale big computing. Large-scale dynamic data is the next
wave of Big Data, namely dynamic data arising from
ubiquitous sensing and control in engineered, natural, and
societal systems. Numerous heterogeneous sensors and
controllers will instrument these systems. The opportunities
and challenges at these “large-scales” relate not only to the size
of the data but the heterogeneity in data, data collection
modalities, data fidelities, and timescale -- ranging from real-
time data moving in microseconds to data at rest (archive). In
tandem with this important dimension of dynamic data is an
extended view of Big Computing, which includes a new
dimension of distributed computing; that is, the range of
computing from the high-end to computing at the sensor and
controller levels, and in particular the collections of networked
assemblies of sensors and controllers.
The DDDAS paradigm, driving and exploiting notions of
large-scale dynamic data and large-scale Big Computing, is
shaping research directions and transforming a range of
application areas. Examples of advances and new capabilities
are presented. These include analysis and decision support for
structural systems, manufacturing, environmental and critical
infrastructure (such as urban and air transportation), and power
grids.
B. NIST Roadmap and Standards – David Boyd, L-3 Data
Tactics
The NIST Big Data Interoperability Framework: Volume 7,
Technology Roadmap was prepared by the NBD-PWG’s
Technology Roadmap Subgroup. It addresses the overarching
information and context about key questions such as:
• When is data considered “Big”?
• How did Big Data evolve?
• What will it evolve to?
• How is technology developing to deal with Big Data in
terms of storage, organization, processing, and resource
management?
• What standards are needed and evolving to deal with Big
Data? and,
• How might organizations address their Big Data
challenges?
This presentation will discuss the issues of Organizational
readiness, technology readiness, technology features, standards
initiatives and strategies.
C. Big Data Analytics Interest Group (BDA IG) of Research
Data Alliance (RDA) – Kwo-Sen Kuo, Bayesics
The Big Data Analytics (BDA) Interest Group was formed
to develop community based recommendations for viable data
analytics approaches to address scientific community needs of
12
efficiently utilizing large quantities of data. It supports
formation of working groups to tackle specific problems.
• BDA aims to clarify some foundational terminologies in
the context of data analytics understanding
differences/overlaps with terms like data science, data
analysis, data mining, etc.
• BDA will develop a recommendation document with a
systematic classification of feasible combinations of
analysis algorithms, analytical tools, data and resource
characteristics and scientific queries. These
recommendation documents can serve as a best practice
guide for scientific groups/communities interested in
investing in Big Data technologies.
• BDA works to develop a consensus amongst its members
to achieve this desired goal.
• BDA collaborates with external bodies and initiatives -
such as NIST, OGC, ISO, EarthServer and others.
D. Next-Generation Computing Systems for Big Data
Machine Learning and Graph Analytics – H. Howie
Huang, George Washington University
Big data machine learning and graph analytics have been
widely used in industry, academia and government.
Continuous advance in this area is critical to business success,
scientific discovery, as well as cybersecurity. In this position
paper, we present the current state of the art, and propose that
next-generation computing systems for Big Data machine
learning and graph analytics need innovative designs in both
hardware and software that provide a good match between Big
Data algorithms and the underlying computing and storage
resources.
IV. BIG DATA SHARING AND COLLABORATION
Critical to moving Big Data forward as a discipline are the
methods needed for improving both collaboration and data
sharing. We are familiar with the cooperation for open source
technology development and in online courses, but how do we
cooperatively move forward and put these technologies into
practice? How do we better provision data frameworks to
promote technology adoption, data sharing and data reuse?
A. Public Private Collaboration – Johan Bos-Beier, ACT/IAC
ACT-IAC Big Data Committee seeks to enable government
agencies to make better data-driven decisions through the
analysis, management, integration, and representation of large
and complex data stores. The BDC seeks to:
• Provide a forum for information sharing and collaboration
between federal, state, and local government agencies
seeking to leverage their data for better informed decision-
making.
• Advise or recommend approaches to developing Big Data
technical frameworks and capability maturity model
assessments.
• Promote Big Data best practices through increasing
awareness of Big Data research, technologies, use cases,
and high performance computing within the Federal
Government.
B. Implementation of Big Data Applications in Government
and Science Communities – Joan L. Aron, Federal Big
Data Working Group
A conceptual overview sets the context for the uses of Big
Data for knowledge discovery and decision support and the
challenges in developing applications. The federation of use
cases, data publications, solutions & technologies provides
examples. Semantic analysis is the basis of solutions for many
applications for government and science communities. The
federal government has greater needs for aggregating data
while maintaining compliance with privacy and security
requirements. Cognitive metadata, which is the metadata
coming from enhancing machine learning with our human
perception, reasoning or intuition, can be used for
personalization purposes and conversely for protecting
personally identifiable information (PII). A new technology
for natural language understanding can be used to find high-
value information in a large body of texts, such as a collection
of agency reports, with little specialized training. Advances in
high-performance computational hardware are also important.
A semantic MEDLINE for searching biomedical research
literature uses hardware built for Resource Description
Framework (RDF) triples in a graph database and semantic
processing developed at the National Library of Medicine. A
high-performance computing cluster environment is in use for
searching public records, patent data, case law and news
articles. Use cases with a focus on environment and Earth
system science illustrate achievements and challenges for the
use of Big Data in data publishing and data access, data
discovery and decision support, and workforce development
for the scientific community and decision-makers to work with
data science.
C. Data-Intensive Science Challenges – Thomas Huang,
NASA Earth Science Data Systems Data-Intensive
Architecture Working Group
Data-Intensive Science defines three high-level activities:
capture, curation, and analysis of data. Tackling Big Science
Data requires more than just infusing Cloud Computing,
Hadoop, and NoSQL. Science data system architecture is an
orchestration of people, process, policies, and technologies. It
requires thorough understanding of the problem space,
assessment of technologies available, process that is repeatable
and traceable, and an adaptable architecture. This session
focuses on architectural discussion and enabling technologies
for tackling data-intensive science. The discussion should be
supported by use cases as the instrument to facilitate review of
current science data systems and assessment of some of the
enabling technologies.
D. Big Data Provenance and Metadata – Rajeev Agrawal,
North Carolina A&T State University
With the progress of new technology, the volume and
complexity of data produced and processed in scientific
research is increasing remarkably. This data is growing so fast
that existing resources are facing difficulty to analyze data
13
properly. It is important to properly track scientific workflows
to provide context and reproducibility. Provenance deals with
this need and assists scientists by delivering the lineage or
history of the way of generating, using and modifying data. We
discuss a complete workflow of tracking provenance
information of Big Data.
V. BIG DATA SECURITY AND PRIVACY
The distribution of data across resources, and the
involvement of a number of organizations in one system open
up new concerns for security and privacy. This panel will focus
on the areas that are new and different because of the Big Data
architectures. The panel will discuss the state of the art in
security and privacy enhancing technologies, Big Data privacy
concerns and the over-arching challenge of deriving knowledge
from Big Data while preserving privacy.
A. Big Data Analytics for Security –Pratyusa Manadhata, HP
and Computer Security Aalliance
Enterprises routinely collect terabytes of security relevant
data (e.g., network events, software application events, and
people action events) for several reasons, including the need
for regulatory compliance and post-hoc forensic analysis. We
estimate that large enterprises may generate 10-100 billion
events per day depending on their size. These numbers will
grow as enterprises enable event logging in more sources, hire
more employees, deploy more devices, and run more software.
Unfortunately, this volume of data quickly becomes
overwhelming. Existing analytical techniques do not work well
at this scale and typically produce so many false positives that
their efficacy is undermined. The problem becomes worse as
enterprises move to cloud architectures and collect much more
data. We will discuss techniques to mitigate this problem.
B. Cyber Security and the Industrial Internet –Stephen
Mellor, Industrial Internet Consortium
Through its public-private partnership, the IIC is committed
to working with public and private partnerships to ensure that
security and privacy are integral parts of Industrial Internet
products and services. The IIC is working with its ecosystem to
identify the requirements for communication protocols and
create mechanisms to enhance rapid discovery, mitigation, and
remediation of vulnerabilities in near real-time. This session
will be an open discussion on how the IIC is defining future
requirements and recommendations to ensure the Industrial
Internet is private and secure.
C. NIST Big Data Security and Privacy –Mark Underwood,
Krypton Brothers
The NIST Big Data Interoperability Framework Volume 4:
Security and Privacy Requirements was prepared by the NBD-
PWG’s Security and Privacy Subgroup to identify security and
privacy issues particular to Big Data. Big Data application
domains include health care, drug discovery, finance and many
others from both the private and public sectors. Among the sce-
narios within these application domains are health exchanges,
clinical trials, mergers and acquisitions, device telemetry, and
international anti-piracy. Security technology domains include
identity, authorization, audit, network and device security, and
federation across trust boundaries.
Clearly, the advent of Big Data has necessitated paradigm
shifts in the understanding and enforcement of security and
privacy (S&P) requirements. Significant changes are evolving,
notably in scaling existing solutions to meet the volume,
variety, and velocity of Big Data, and re-targeting security
solutions amid shifts in technology infrastructure, e.g., dis-
tributed computing systems and non-relational data storage. In
addition, as diverse datasets become ever-easier to access,
many are increasingly personal in nature. Thus, a whole new
set of emerging issues must be addressed, including balancing
privacy and utility, enabling analytics and governance on
encrypted data, and reconciling authentication and anonymity.
Working with other subgroups in the NBD-PWG, this
subgroup has begun to expand the distributed computing
concept of a Big Data security fabric.
With the key Big Data characteristics of variety, volume,
and velocity in mind, the subgroup gathered use cases from
volunteers, developed a consensus security and privacy taxon-
omy and reference architecture, and validated it by mapping
the use cases to the reference architecture.
D. Education Data Pricacy and State Boards of Education –
Amelia Vance, National Association of State Borads of
Education
Big data has the potential to revolutionize education, al-
lowing for more efficient and effective schools. It can allow
every teacher to personalize every element of instruction, and
enable policymakers to see exactly which elements of each
educational policy are successful in helping ensure students are
college-and career-ready. However, while many technologists
believe that the benefits of Big Data in education are self-
evident and outweigh any dangers of collecting sensitive stu-
dent information, many parents, teachers, and policymakers do
not feel the same way. Only now are parents learning about the
data schools are collecting about their children. They are justly
concerned about how it is used and shared— the fact that data
collection is often outsourced to third-party vendors only adds
to their skepticism and concerns for their childs privacy. This
has led to an instinctual response by many policymakers and
others to work against the use of Big Data in education, despite
the potential benefits it may have for education. In 2014, state
legislatures introduced 110 bills in 36 states regarding student
data privacy. Seventy-nine of the 2014 bills have at least some
elements that would restrict the use of data in education. For
example, New Hampshires bill, which was passed into law,
likely prevents predictive analytics. A bill in Missouri would
have defunded their statewide longitudinal data system. In all,
28 of the 110 bills introduced passed into law this year. And,
the number of student data privacy bills is expected to double
in the 2015 legislative session.
Many of the bills introduced, and the laws passed, give
state boards of education (SBEs) a key role in the data privacy
discussion. Eighteen SBEs are tasked by statute with writing
their states student data management policy or have oversight
authority for the agency that is writing the policy. Thirteen
SBEs are members of their states data management team.
14
Seven SBEs are required by statute to ensure FERPA com-
pliance. Fifty-five bills introduced in 2014 would give SBEs
some authority in regulating student data privacy. Existing
state privacy laws give many SBEs authority over various
things to help secure data privacy, including appointing a chief
privacy officer, adopting and/or implementing state privacy
policies, and providing oversight of vendor contracts. SBEs
have also independently passed rules for their states to protect
data privacy. Unfortunately, like many other policymakers,
many SBE members are unaware of the potential benefits of
Big Data in education. Education data privacy requires
knowledge of privacy law, a basic understanding of Big Data,
and a great deal of time to learn about the ins and outs of
todays education data privacy debate. The National
Association of State Boards of Education (NASBE) is helping
SBEs understand and pass effective policies on these issues
that will protect data privacy while supporting educational
innovation through the use of Big Data. In this panel, Amelia
Vance from NASBE will discuss the role SBEs play in
education data collection, the questions they are asking as they
put together state privacy policies (particularly those dealing
with third party use of data), and what information
policymakers need from technology providers in order to trust
the use of Big Data in education.
We consider the perspectives and recommendations from
multiple organizations and experts, including the Data Quality
Campaign, the Electronic Privacy Information Center, and the
Pioneer Institute, as well as examine the lessons learned thus
far in states from failed attempts in responsible data collection
and privacy security.
ACKNOWLEDGMENT
The authors wish to thank the panelists for their time and
efforts to share their expertise and further the dialog for
clarifying the new discipline of Big Data. The authors also
wish to acknowledge the contributions of the large group of
participants in the NBD-PWG, who have discussed at length
the emerging discipline of Big Data, and have helped form a
collective understanding of this new paradigm.
REFERENCES
[1] N. Grady, W. Chang, eds. “NIST Big Data Interoperability Framework:
Volume 1, Definitions” NIST. unpublished.
[2] N. Grady, W. Chang, eds. “NIST Big Data Interoperability Framework:
Volume 2, Taxonomy” NIST. unpublished.
[3] G. Fox, W. Chang, eds. “NIST Big Data Interoperability Framework:
Volume 3, Use Cases and Requirements” NIST. unpublished.
[4] A. Roy, M. Underwood, W. Chang, eds. “NIST Big Data
Interoperability Framework: Volume 4, Security and Privacy
Requirements” NIST. unpublished.
[5] S. Mishra, W. Chang, eds. “NIST Big Data Interoperability Framework:
Volume 5, Architectures White Paper Survey” NIST. unpublished.
[6] O. Levin, W. Chang, eds. “NIST Big Data Interoperability Framework:
Volume 6, Reference Architecture” NIST. unpublished.
[7] D. Boyd, C. Buffington, W. Chang, eds. “NIST Big Data
Interoperability Framework: Volume 7, Taxonomy” NIST. unpublished.
15

More Related Content

What's hot

Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Kathirvel Ayyaswamy
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
chennaijp
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
Dilpreet kaur Virk
 
An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.
ijceronline
 
Big data
Big dataBig data
Big data
Sakshi Chawla
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
Akuhuruf
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
kk1718
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
Aseem Chakrabarthy
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
Xoriant Corporation
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
AbhijeetPandey71
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
Poonam Kshirsagar
 
Big data storage
Big data storageBig data storage
Big data storage
Vikram Nandini
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
Tiago Knoch
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
Harsh Kishore Mishra
 
Big data
Big dataBig data
Big data
hsn99
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
Stefan Kühn
 
Big data
Big dataBig data
Big data
Claire Choong
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
Regional Science Academy
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 

What's hot (19)

Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.An Comprehensive Study of Big Data Environment and its Challenges.
An Comprehensive Study of Big Data Environment and its Challenges.
 
Big data
Big dataBig data
Big data
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data mining & big data presentation 01
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data
Big dataBig data
Big data
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Big data
Big dataBig data
Big data
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Similar to Big data: Challenges, Practices and Technologies

Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
IJERA Editor
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
ijsrd.com
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
cscpconf
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
Dr. Radhey Shyam
 
Big data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docx
hartrobert670
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
Dr. Radhey Shyam
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
IRJET Journal
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
Maria de la Iglesia
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
jadhavpravin920
 
A Deep Dissertion Of Data Science Related Issues And Its Applications
A Deep Dissertion Of Data Science  Related Issues And Its ApplicationsA Deep Dissertion Of Data Science  Related Issues And Its Applications
A Deep Dissertion Of Data Science Related Issues And Its Applications
Tracy Hill
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal1
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
ieijjournal
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
vvpadhu
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
maigva
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
JOSEPH FRANCIS
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
SherinMariamReji05
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
Dr. Radhey Shyam
 
1
11

Similar to Big data: Challenges, Practices and Technologies (20)

Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Big data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docx
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
A Deep Dissertion Of Data Science Related Issues And Its Applications
A Deep Dissertion Of Data Science  Related Issues And Its ApplicationsA Deep Dissertion Of Data Science  Related Issues And Its Applications
A Deep Dissertion Of Data Science Related Issues And Its Applications
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
1
11
1
 

More from Navneet Randhawa

Adjunctive role of Orthodontic Therapy in Periodontology
Adjunctive role of Orthodontic Therapy in PeriodontologyAdjunctive role of Orthodontic Therapy in Periodontology
Adjunctive role of Orthodontic Therapy in Periodontology
Navneet Randhawa
 
Osseointegration
OsseointegrationOsseointegration
Osseointegration
Navneet Randhawa
 
Periodontal pocket
Periodontal pocketPeriodontal pocket
Periodontal pocket
Navneet Randhawa
 
Smoking and periodontal disease
Smoking and periodontal diseaseSmoking and periodontal disease
Smoking and periodontal disease
Navneet Randhawa
 
Diagnostic aid
Diagnostic aidDiagnostic aid
Diagnostic aid
Navneet Randhawa
 
Cytokines
Cytokines Cytokines
Cytokines
Navneet Randhawa
 
Aging and perio
Aging and perioAging and perio
Aging and perio
Navneet Randhawa
 
Systemic periodontology
Systemic periodontologySystemic periodontology
Systemic periodontology
Navneet Randhawa
 
Gingival enlargment and its treatment
Gingival enlargment and its treatmentGingival enlargment and its treatment
Gingival enlargment and its treatment
Navneet Randhawa
 
Implant related periodontal disease
Implant related periodontal diseaseImplant related periodontal disease
Implant related periodontal disease
Navneet Randhawa
 
Host modulation therapy
Host modulation therapyHost modulation therapy
Host modulation therapy
Navneet Randhawa
 
Supportive periodontal therapy
Supportive periodontal therapy Supportive periodontal therapy
Supportive periodontal therapy
Navneet Randhawa
 
Trauma from occlusion
Trauma from occlusionTrauma from occlusion
Trauma from occlusion
Navneet Randhawa
 
Endodontic periodontic interrelationship
Endodontic periodontic interrelationship Endodontic periodontic interrelationship
Endodontic periodontic interrelationship
Navneet Randhawa
 
Microbiology : Emphasis on the oral cavity
 Microbiology : Emphasis on the oral cavity Microbiology : Emphasis on the oral cavity
Microbiology : Emphasis on the oral cavity
Navneet Randhawa
 
Periodontal regeneration
Periodontal regeneration Periodontal regeneration
Periodontal regeneration
Navneet Randhawa
 
Adrenal gland physiology
Adrenal gland physiologyAdrenal gland physiology
Adrenal gland physiology
Navneet Randhawa
 
Cementum : An integral part of the Periodontium
Cementum : An integral part of the PeriodontiumCementum : An integral part of the Periodontium
Cementum : An integral part of the Periodontium
Navneet Randhawa
 
Blood supply and nerve supply to head and neck
Blood supply and nerve supply to head and neckBlood supply and nerve supply to head and neck
Blood supply and nerve supply to head and neck
Navneet Randhawa
 
Gingival crevicular fluid sampling techniques
Gingival crevicular fluid sampling techniques Gingival crevicular fluid sampling techniques
Gingival crevicular fluid sampling techniques
Navneet Randhawa
 

More from Navneet Randhawa (20)

Adjunctive role of Orthodontic Therapy in Periodontology
Adjunctive role of Orthodontic Therapy in PeriodontologyAdjunctive role of Orthodontic Therapy in Periodontology
Adjunctive role of Orthodontic Therapy in Periodontology
 
Osseointegration
OsseointegrationOsseointegration
Osseointegration
 
Periodontal pocket
Periodontal pocketPeriodontal pocket
Periodontal pocket
 
Smoking and periodontal disease
Smoking and periodontal diseaseSmoking and periodontal disease
Smoking and periodontal disease
 
Diagnostic aid
Diagnostic aidDiagnostic aid
Diagnostic aid
 
Cytokines
Cytokines Cytokines
Cytokines
 
Aging and perio
Aging and perioAging and perio
Aging and perio
 
Systemic periodontology
Systemic periodontologySystemic periodontology
Systemic periodontology
 
Gingival enlargment and its treatment
Gingival enlargment and its treatmentGingival enlargment and its treatment
Gingival enlargment and its treatment
 
Implant related periodontal disease
Implant related periodontal diseaseImplant related periodontal disease
Implant related periodontal disease
 
Host modulation therapy
Host modulation therapyHost modulation therapy
Host modulation therapy
 
Supportive periodontal therapy
Supportive periodontal therapy Supportive periodontal therapy
Supportive periodontal therapy
 
Trauma from occlusion
Trauma from occlusionTrauma from occlusion
Trauma from occlusion
 
Endodontic periodontic interrelationship
Endodontic periodontic interrelationship Endodontic periodontic interrelationship
Endodontic periodontic interrelationship
 
Microbiology : Emphasis on the oral cavity
 Microbiology : Emphasis on the oral cavity Microbiology : Emphasis on the oral cavity
Microbiology : Emphasis on the oral cavity
 
Periodontal regeneration
Periodontal regeneration Periodontal regeneration
Periodontal regeneration
 
Adrenal gland physiology
Adrenal gland physiologyAdrenal gland physiology
Adrenal gland physiology
 
Cementum : An integral part of the Periodontium
Cementum : An integral part of the PeriodontiumCementum : An integral part of the Periodontium
Cementum : An integral part of the Periodontium
 
Blood supply and nerve supply to head and neck
Blood supply and nerve supply to head and neckBlood supply and nerve supply to head and neck
Blood supply and nerve supply to head and neck
 
Gingival crevicular fluid sampling techniques
Gingival crevicular fluid sampling techniques Gingival crevicular fluid sampling techniques
Gingival crevicular fluid sampling techniques
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

Big data: Challenges, Practices and Technologies

  • 1. Big Data: Challenges, Practices and Technologies NIST Big Data Public Working Group Workshop at IEEE Big Data 2014 Nancy W. Grady1 , Mark Underwood2 , Arnab Roy3 , Wo L. Chang4 1 Science Applications International Corporation, nancy.w.grady@saic.com 2 Krypton Brothers, LLC, mark.underwood@kryptonbrothers.com 3 Fujitsu Laboratories of America, aroy@us.fujitsu.com 4 National Institute of Standards and Technology, wchang@nist.gov Abstract—Big Data has changed both technologies and practices for building data analytics systems. A number of working groups have been discussing the recent changes along a number of dimensions. The NIST Big Data Public Working Group organized a workshop to promote communication among working groups, technologists, and practitioners to come to an understanding of the current state of the Big Data discipline, collaboration best practices, future directions for this emerging specialization, and to identify security and privacy concerns. Keywords—Big Data; reference architecture; collaboration; security; privacy; metadata; standards I. INTRODUCTION NIST has been facilitating a Big Data Public Working Group (NBD-PWG) to form a community of interest from industry, academia, and government to promote better understanding of this new discipline. The aim has been the development of consensus definitions, taxonomies, reference architectures, and technology roadmaps based on an understanding of use cases and requirements. The goal is to create vendor-neutral, technology and infrastructure agnostic vocabulary and descriptions. This will enable Big Data stakeholders to better understand the emerging discipline, and to choose the best-suited analytics tools for their processing and visualization requirements on the most appropriate computing platforms and clusters. By providing a framework for communication, needs and capabilities can be better matched between technologists and practitioners. To further the NBD-PWG goals, a workshop was staged at the IEEE Big Data 2014 conference to bring together technologists and practitioners, to understand what has changed with Big Data, assess the current state of the art, identify lessons learned, and surface known challenges. To span this new discipline, four panels were organized: The State of Big Data Technology, Big Data Future Trends, Big Data Sharing and Collaboration, and Big Data Security and Privacy. II. THE STATE OF BIG DATA TECHNOLOGY The term Big Data has come to mean many things, and communication about new approaches has been hindered by the conflicting vocabulary and definitions. To better understand this emerging discipline, this panel discussed frameworks for understanding the new architectures, use cases and requirements, and benchmarks. A. Data Consistency Issues in Big Data Systems – Jianmin Wang, Tsinghua Big Data Research Center Distributed storage systems are required to guarantee data reliability, fault-tolerance and accessibility for users. Besides hardware configuration, the design and implementation of distributed systems is very important to reach these goals. The most common solution is that we store multiple copies of the same data in different storage devices. The multiple copies are called data replica. We take two popular distributed storage systems as examples to analyze the working mechanism and replica consistency. The first one is Cassandra, propagating data in a star model and the second one is HDFS, propagating data in a chain model. B. NIST Big Data Interoperability Framework – Orit Levin, Microsoft The National Institute of Standards and Technology (NIST) NIST Big Data Interoperability Framework, Volume 6: Reference Architecture is one of seven volumes in the roadmap, whose overall aims are to define and prioritize Big Data requirements, including interoperability, portability, reusability, extensibility, data usage, analytic techniques, and technology infrastructure in order to support secure and effective adoption of Big Data. The Reference Architecture is dedicated to developing a vendor-neutral, technology- and infrastructure-agnostic conceptual model and examining related issues. Created by the NIST Big Data Public Working Group (NBD-PWG) Reference Architecture Subgroup, the conceptual model is based on the analysis of public Big Data material and inputs from the other NBD-PWG subgroups. The NIST Big Data Reference Architecture (NBD-RA) is applicable to a variety of business environments including tightly-integrated enterprise systems, as well as loosely- coupled vertical industries that rely on the cooperation of independent stakeholders. C. NIST - Use Cases and Requirements- Geoffrey Fox, Indiana University The NIST Big Data Public Working Group collected an extensive catalog of use cases, reflecting Big Data applications in public health, epidemiology, U.S. census, cargo shipping, geointelligence, defense, genomics, recommendation engines and media. These applications were then examined in the 2014 IEEE International Conference on Big Data 978-1-4799-5666-1/14/$31.00 ©2014 IEEE 11
  • 2. context of the NIST group’s reference architecture to identify recurring patterns thought to be specific to Big Data applications. These patterns were further explored in light of current Apache stack offerings. These insights will likely be useful to prospective system designers. D. Introducing TPCx-HS – first Industry Standard for Benchmarking Big Data Systems – Raghunath Nambiar, Cisco Over the past quarter century, industry standard benchmarks have had a significant impact on the computing industry. Vendors use benchmark standards to illustrate performance competitiveness for their existing products, and to improve and monitor the performance of their products under development. Many buyers use the results as points of comparison when purchasing new computing systems. Continuing on the Transaction Processing Performance Council’s commitment to bring relevant benchmarks to industry, the TPC announced TPCx-HS – the first standard that provides verifiable performance, price/performance and energy consumption metrics for Big Data systems. TPCx-HS can be used to assess a broad range of system topologies and implementation methodologies for Hadoop, in a technically rigorous and directly comparable, vendor-neutral manner. And while modeling is based on a simple application, the results are highly relevant to Big Data hardware and software systems. III. BIG DATA FUTURE DIRECTIONS Is volume, velocity, variety, veracity or some other facet of Big Data most critical for planning a particular Big Data project? Will a given deployment, even if well considered, find itself overtaken by a superseding technology? What are the emerging trends and technologies to be aware of? These are questions practitioners must entertain now as new commercial releases are transforming the capabilities of widely used Big Data software. The Future Directions panel considers likely Big Data trends in hardware, computing models, analytics and measurement. A. InfoSymbiotics/DDDAS and the Nest Generation of Big Data and Big Computing – Frederica Darema, Air Force Office of Scientific Research We describe the DDDAS (Dynamic Data Driven Applications Systems), a new paradigm unifying systems modeling and systems instrumentation. DDDAS can facilitate new capabilities for advanced modeling/simulation and intelligent exploitation of data of engineered, natural, and societal multi-entity systems. Results may include improved understanding, analysis, and optimized, autonomic management and decision support of operational conditions of these systems. The key underlying concept in DDDAS is the dynamic integration between data and computation, whereby instrumentation data and executing models of systems become a feedback control loop. On-line data are dynamically incorporated into executing models of the system to improve the accuracy or speedup the simulation, and in reverse the executing model controls the instrumentation to selectively target the data collection process to improve accuracy and measurability. This paradigm, unifying modeling and instrumentation, is timely with the advent of large-scale dynamic data and large- scale big computing. Large-scale dynamic data is the next wave of Big Data, namely dynamic data arising from ubiquitous sensing and control in engineered, natural, and societal systems. Numerous heterogeneous sensors and controllers will instrument these systems. The opportunities and challenges at these “large-scales” relate not only to the size of the data but the heterogeneity in data, data collection modalities, data fidelities, and timescale -- ranging from real- time data moving in microseconds to data at rest (archive). In tandem with this important dimension of dynamic data is an extended view of Big Computing, which includes a new dimension of distributed computing; that is, the range of computing from the high-end to computing at the sensor and controller levels, and in particular the collections of networked assemblies of sensors and controllers. The DDDAS paradigm, driving and exploiting notions of large-scale dynamic data and large-scale Big Computing, is shaping research directions and transforming a range of application areas. Examples of advances and new capabilities are presented. These include analysis and decision support for structural systems, manufacturing, environmental and critical infrastructure (such as urban and air transportation), and power grids. B. NIST Roadmap and Standards – David Boyd, L-3 Data Tactics The NIST Big Data Interoperability Framework: Volume 7, Technology Roadmap was prepared by the NBD-PWG’s Technology Roadmap Subgroup. It addresses the overarching information and context about key questions such as: • When is data considered “Big”? • How did Big Data evolve? • What will it evolve to? • How is technology developing to deal with Big Data in terms of storage, organization, processing, and resource management? • What standards are needed and evolving to deal with Big Data? and, • How might organizations address their Big Data challenges? This presentation will discuss the issues of Organizational readiness, technology readiness, technology features, standards initiatives and strategies. C. Big Data Analytics Interest Group (BDA IG) of Research Data Alliance (RDA) – Kwo-Sen Kuo, Bayesics The Big Data Analytics (BDA) Interest Group was formed to develop community based recommendations for viable data analytics approaches to address scientific community needs of 12
  • 3. efficiently utilizing large quantities of data. It supports formation of working groups to tackle specific problems. • BDA aims to clarify some foundational terminologies in the context of data analytics understanding differences/overlaps with terms like data science, data analysis, data mining, etc. • BDA will develop a recommendation document with a systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries. These recommendation documents can serve as a best practice guide for scientific groups/communities interested in investing in Big Data technologies. • BDA works to develop a consensus amongst its members to achieve this desired goal. • BDA collaborates with external bodies and initiatives - such as NIST, OGC, ISO, EarthServer and others. D. Next-Generation Computing Systems for Big Data Machine Learning and Graph Analytics – H. Howie Huang, George Washington University Big data machine learning and graph analytics have been widely used in industry, academia and government. Continuous advance in this area is critical to business success, scientific discovery, as well as cybersecurity. In this position paper, we present the current state of the art, and propose that next-generation computing systems for Big Data machine learning and graph analytics need innovative designs in both hardware and software that provide a good match between Big Data algorithms and the underlying computing and storage resources. IV. BIG DATA SHARING AND COLLABORATION Critical to moving Big Data forward as a discipline are the methods needed for improving both collaboration and data sharing. We are familiar with the cooperation for open source technology development and in online courses, but how do we cooperatively move forward and put these technologies into practice? How do we better provision data frameworks to promote technology adoption, data sharing and data reuse? A. Public Private Collaboration – Johan Bos-Beier, ACT/IAC ACT-IAC Big Data Committee seeks to enable government agencies to make better data-driven decisions through the analysis, management, integration, and representation of large and complex data stores. The BDC seeks to: • Provide a forum for information sharing and collaboration between federal, state, and local government agencies seeking to leverage their data for better informed decision- making. • Advise or recommend approaches to developing Big Data technical frameworks and capability maturity model assessments. • Promote Big Data best practices through increasing awareness of Big Data research, technologies, use cases, and high performance computing within the Federal Government. B. Implementation of Big Data Applications in Government and Science Communities – Joan L. Aron, Federal Big Data Working Group A conceptual overview sets the context for the uses of Big Data for knowledge discovery and decision support and the challenges in developing applications. The federation of use cases, data publications, solutions & technologies provides examples. Semantic analysis is the basis of solutions for many applications for government and science communities. The federal government has greater needs for aggregating data while maintaining compliance with privacy and security requirements. Cognitive metadata, which is the metadata coming from enhancing machine learning with our human perception, reasoning or intuition, can be used for personalization purposes and conversely for protecting personally identifiable information (PII). A new technology for natural language understanding can be used to find high- value information in a large body of texts, such as a collection of agency reports, with little specialized training. Advances in high-performance computational hardware are also important. A semantic MEDLINE for searching biomedical research literature uses hardware built for Resource Description Framework (RDF) triples in a graph database and semantic processing developed at the National Library of Medicine. A high-performance computing cluster environment is in use for searching public records, patent data, case law and news articles. Use cases with a focus on environment and Earth system science illustrate achievements and challenges for the use of Big Data in data publishing and data access, data discovery and decision support, and workforce development for the scientific community and decision-makers to work with data science. C. Data-Intensive Science Challenges – Thomas Huang, NASA Earth Science Data Systems Data-Intensive Architecture Working Group Data-Intensive Science defines three high-level activities: capture, curation, and analysis of data. Tackling Big Science Data requires more than just infusing Cloud Computing, Hadoop, and NoSQL. Science data system architecture is an orchestration of people, process, policies, and technologies. It requires thorough understanding of the problem space, assessment of technologies available, process that is repeatable and traceable, and an adaptable architecture. This session focuses on architectural discussion and enabling technologies for tackling data-intensive science. The discussion should be supported by use cases as the instrument to facilitate review of current science data systems and assessment of some of the enabling technologies. D. Big Data Provenance and Metadata – Rajeev Agrawal, North Carolina A&T State University With the progress of new technology, the volume and complexity of data produced and processed in scientific research is increasing remarkably. This data is growing so fast that existing resources are facing difficulty to analyze data 13
  • 4. properly. It is important to properly track scientific workflows to provide context and reproducibility. Provenance deals with this need and assists scientists by delivering the lineage or history of the way of generating, using and modifying data. We discuss a complete workflow of tracking provenance information of Big Data. V. BIG DATA SECURITY AND PRIVACY The distribution of data across resources, and the involvement of a number of organizations in one system open up new concerns for security and privacy. This panel will focus on the areas that are new and different because of the Big Data architectures. The panel will discuss the state of the art in security and privacy enhancing technologies, Big Data privacy concerns and the over-arching challenge of deriving knowledge from Big Data while preserving privacy. A. Big Data Analytics for Security –Pratyusa Manadhata, HP and Computer Security Aalliance Enterprises routinely collect terabytes of security relevant data (e.g., network events, software application events, and people action events) for several reasons, including the need for regulatory compliance and post-hoc forensic analysis. We estimate that large enterprises may generate 10-100 billion events per day depending on their size. These numbers will grow as enterprises enable event logging in more sources, hire more employees, deploy more devices, and run more software. Unfortunately, this volume of data quickly becomes overwhelming. Existing analytical techniques do not work well at this scale and typically produce so many false positives that their efficacy is undermined. The problem becomes worse as enterprises move to cloud architectures and collect much more data. We will discuss techniques to mitigate this problem. B. Cyber Security and the Industrial Internet –Stephen Mellor, Industrial Internet Consortium Through its public-private partnership, the IIC is committed to working with public and private partnerships to ensure that security and privacy are integral parts of Industrial Internet products and services. The IIC is working with its ecosystem to identify the requirements for communication protocols and create mechanisms to enhance rapid discovery, mitigation, and remediation of vulnerabilities in near real-time. This session will be an open discussion on how the IIC is defining future requirements and recommendations to ensure the Industrial Internet is private and secure. C. NIST Big Data Security and Privacy –Mark Underwood, Krypton Brothers The NIST Big Data Interoperability Framework Volume 4: Security and Privacy Requirements was prepared by the NBD- PWG’s Security and Privacy Subgroup to identify security and privacy issues particular to Big Data. Big Data application domains include health care, drug discovery, finance and many others from both the private and public sectors. Among the sce- narios within these application domains are health exchanges, clinical trials, mergers and acquisitions, device telemetry, and international anti-piracy. Security technology domains include identity, authorization, audit, network and device security, and federation across trust boundaries. Clearly, the advent of Big Data has necessitated paradigm shifts in the understanding and enforcement of security and privacy (S&P) requirements. Significant changes are evolving, notably in scaling existing solutions to meet the volume, variety, and velocity of Big Data, and re-targeting security solutions amid shifts in technology infrastructure, e.g., dis- tributed computing systems and non-relational data storage. In addition, as diverse datasets become ever-easier to access, many are increasingly personal in nature. Thus, a whole new set of emerging issues must be addressed, including balancing privacy and utility, enabling analytics and governance on encrypted data, and reconciling authentication and anonymity. Working with other subgroups in the NBD-PWG, this subgroup has begun to expand the distributed computing concept of a Big Data security fabric. With the key Big Data characteristics of variety, volume, and velocity in mind, the subgroup gathered use cases from volunteers, developed a consensus security and privacy taxon- omy and reference architecture, and validated it by mapping the use cases to the reference architecture. D. Education Data Pricacy and State Boards of Education – Amelia Vance, National Association of State Borads of Education Big data has the potential to revolutionize education, al- lowing for more efficient and effective schools. It can allow every teacher to personalize every element of instruction, and enable policymakers to see exactly which elements of each educational policy are successful in helping ensure students are college-and career-ready. However, while many technologists believe that the benefits of Big Data in education are self- evident and outweigh any dangers of collecting sensitive stu- dent information, many parents, teachers, and policymakers do not feel the same way. Only now are parents learning about the data schools are collecting about their children. They are justly concerned about how it is used and shared— the fact that data collection is often outsourced to third-party vendors only adds to their skepticism and concerns for their childs privacy. This has led to an instinctual response by many policymakers and others to work against the use of Big Data in education, despite the potential benefits it may have for education. In 2014, state legislatures introduced 110 bills in 36 states regarding student data privacy. Seventy-nine of the 2014 bills have at least some elements that would restrict the use of data in education. For example, New Hampshires bill, which was passed into law, likely prevents predictive analytics. A bill in Missouri would have defunded their statewide longitudinal data system. In all, 28 of the 110 bills introduced passed into law this year. And, the number of student data privacy bills is expected to double in the 2015 legislative session. Many of the bills introduced, and the laws passed, give state boards of education (SBEs) a key role in the data privacy discussion. Eighteen SBEs are tasked by statute with writing their states student data management policy or have oversight authority for the agency that is writing the policy. Thirteen SBEs are members of their states data management team. 14
  • 5. Seven SBEs are required by statute to ensure FERPA com- pliance. Fifty-five bills introduced in 2014 would give SBEs some authority in regulating student data privacy. Existing state privacy laws give many SBEs authority over various things to help secure data privacy, including appointing a chief privacy officer, adopting and/or implementing state privacy policies, and providing oversight of vendor contracts. SBEs have also independently passed rules for their states to protect data privacy. Unfortunately, like many other policymakers, many SBE members are unaware of the potential benefits of Big Data in education. Education data privacy requires knowledge of privacy law, a basic understanding of Big Data, and a great deal of time to learn about the ins and outs of todays education data privacy debate. The National Association of State Boards of Education (NASBE) is helping SBEs understand and pass effective policies on these issues that will protect data privacy while supporting educational innovation through the use of Big Data. In this panel, Amelia Vance from NASBE will discuss the role SBEs play in education data collection, the questions they are asking as they put together state privacy policies (particularly those dealing with third party use of data), and what information policymakers need from technology providers in order to trust the use of Big Data in education. We consider the perspectives and recommendations from multiple organizations and experts, including the Data Quality Campaign, the Electronic Privacy Information Center, and the Pioneer Institute, as well as examine the lessons learned thus far in states from failed attempts in responsible data collection and privacy security. ACKNOWLEDGMENT The authors wish to thank the panelists for their time and efforts to share their expertise and further the dialog for clarifying the new discipline of Big Data. The authors also wish to acknowledge the contributions of the large group of participants in the NBD-PWG, who have discussed at length the emerging discipline of Big Data, and have helped form a collective understanding of this new paradigm. REFERENCES [1] N. Grady, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 1, Definitions” NIST. unpublished. [2] N. Grady, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 2, Taxonomy” NIST. unpublished. [3] G. Fox, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 3, Use Cases and Requirements” NIST. unpublished. [4] A. Roy, M. Underwood, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 4, Security and Privacy Requirements” NIST. unpublished. [5] S. Mishra, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey” NIST. unpublished. [6] O. Levin, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 6, Reference Architecture” NIST. unpublished. [7] D. Boyd, C. Buffington, W. Chang, eds. “NIST Big Data Interoperability Framework: Volume 7, Taxonomy” NIST. unpublished. 15