SlideShare a Scribd company logo
CITA’15 Workshop, August 2015
Semantic Enrichment of
Unstructured Datasets
Bebo White
SLAC National Accelerator Laboratory/
Stanford University
bebo@slac.stanford.edu
CITA’15 Workshop,August 2015
Workshop Schedule
• 08:30-10:30 - Session 1
• 10:30-11:00 - Morning Tea Break
• 11:30-12:30 - Session 2
• 12:30-14:00 - Lunch Break
• 14:00-15:30 - Session 3
• 15:30-16:00 - Afternoon Tea Break
• 16:00-17:00 - Session 4
CITA’15 Workshop,August 2015
Workshop Agenda (1/3)
• Overview of “Big Data Analytics”
• Goals
• Common challenges
• Examples and applications
• What is missing
• Big Data and Open Data
• Characteristics of open (and semantic) data
• Usage
• Challenges
• Processes
CITA’15 Workshop,August 2015
Workshop Agenda (2/3)
• Semantically describing date
• Ontologies and namespaces
• Data triples
• Triplification
• Introduction to RDF(S)
• Case Study - FOAF
• Merging RDF data
• RDF tools
CITA’15 Workshop,August 2015
Workshop Agenda (3/3)
• PingER as a triplification case study
• Introduction to project
• PingER LOD
• Data model and process
• PingER LOD “data bloating”
• How PingER LOD extends PingER
• Summary and lessons learned
CITA’15 Workshop,August 2015
Workshop Format
• A workshop, not a tutorial
• Goal is to introduce concepts and
terminology and provoke future research
• Must be very interactive - questions/
discussion at any time
• Individual and group exercises
• Length of workshop depends on involvement
CITA’15 Workshop,August 2015
“High-volume, -velocity, and -variety information assets
that demand cost-effective innovative forms of information
processing for enhanced insight and decision making”
CITA’15 Workshop,August 2015
• Volume?
• ~ data volume worldwide in 2013 = 3.5 ZB
(including 400 billion feature length HD movies)
• Velocity?
• Every 60 sec. on Facebook - 510K posted
comments; 293K status updates; 136K uploaded
photos
• 30 billion shares
• 20 million apps installed
CITA’15 Workshop,August 2015
• Variety?
• Any type of data both meaningful and
meaningless
• Veracity?
• How is trust established?
• What does “like” really mean?
CITA’15 Workshop,August 2015
Evaluating “theV’s”
• A recent survey conducted by Paradigm4 indicates
• variety, not volume, is the bigger challenge of
analyzing Big Data - 71% of respondents
• Data Scientists aren’t terribly concerned with
the “size” of the data being currently analyzed -
tools and systems are in place to work with
large datasets
• storing large amounts of structured (or semi-
structured) data is not the problem, analysis is
CITA’15 Workshop,August 2015
Common Challenges of
Harnessing Big Data
• Mining huge (?) datasets
• Shortages of Big Data experts
• Privacy, legal, and social issues
• Strategies for acquiring Big Data - a new
form of currency
• BUT
CITA’15 Workshop,August 2015
“The theory is that you pump Big Data into the
‘black box’ of an analytics engine - most likely
hidden on some unknown server in the cloud -
and you get back a continuous stream of
insights”
CITA’15 Workshop,August 2015
“When you have large amounts of data your
appetite for hypotheses tends to get larger.And
if it’s growing faster than the statistical strength
of the data, then many of your inferences are
likely to be false.They are likely to be ‘white
noise.’ We have to have error bars around our
predictions.”
-Michael Jordan
CITA’15 Workshop,August 2015
Why is “Bigger Data”
Better?
• Outliers or small
clusters
• Rare discrete values or
classes
• Missing values
• Rare events or objects
CITA’15 Workshop,August 2015
Big Data Analytics and
Data Science
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Unstructured Data
• Does not have a pre-defined data model or
is not organized in a pre-defined manner
• Typically text-heavy, but may contain data
such as dates, numbers, and facts
• May result in irregularities and ambiguities
that make it difficult to understand using
traditional programs
(Ref: Wikipedia)
CITA’15 Workshop,August 2015
Typical Big Data Problem
• Iterate over a large number of records
• Extract something of interest from each
(MAP)
• Shuffle and sort immediate results
• Aggregate immediate results (REDUCE)
• Generate final output
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
MapReduce Can Refer
to…
• The programming model
• The execution framework (aka “runtime”
• The specific implementation
CITA’15 Workshop,August 2015
MapReduce
Implementations
• Google has a proprietary implementation in C++
• Bindings in Java, Python
• Hadoop is an open-source implementation in Java
• Development led byYahoo!, now an Apache project
• Used in production atYahoo!, Facebook,Twitter,
LinkedIn, Netflix, etc.
• The de facto Big Data processing platform
• Lots of custom research implementations
CITA’15 Workshop,August 2015
An Interesting Example -
“Sentiment Analysis”
• Goal - gauging mood on social network
data
• Not a traditional survey or focus group
• Social sites operate 24/7
• Timeliness - not subject to time lags
• Useful to marketers, IT, customers, etc. - a
limited (not general) sector
CITA’15 Workshop,August 2015
Difficult Comment
Analysis (1/2)
• False negatives - “crying” & “crap” (negative) vs.
“crying with joy” & “holy crap!” (positive)
• Relative sentiment - “I bought a Honda Accord” -
great for Honda, bad for Toyota
• Compound sentiment - “I love the phone but hate
the network”
• Conditional sentiment - “If someone doesn’t call
me back, I’m never doing business with them
again!”
CITA’15 Workshop,August 2015
Difficult Comment
Analysis (2/2)
• Scoring sentiment - “I like it” vs.“I really like it” vs.
“I love it”
• Sentiment modifiers - “I bought an iPhone
today :-)” “Gotta love the telephone company ;-<“
• International/cultural sentiments
• Japanese - unique emoticons for crying - (;_;)
• Italians - effusive, grandiose
• British - drier, less effusive
CITA’15 Workshop,August 2015
Analyzing Significant Correlations
Between Social Media Measures and
Sales
CITA’15 Workshop,August 2015
Why is this a good
analytical model/process
for MapReduce?
CITA’15 Workshop,August 2015
However
MapReduce is NOT the only way to do “Big
Data” analytics
CITA’15 Workshop,August 2015
There is a 5th “V”
VALUE
Despite it’s volume, veracity, etc., what does
it really give us?
How can we extract insight/knowledge?
CITA’15 Workshop,August 2015
Interesting stuff, but
• Who is it benefiting?
• Is it making us smarter or safer?
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
What I’m really interested in
is the intersection between
Big Data and Linked Data…
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Looking back…
• One of the great (IMHO) insights in Web
2.0 was developing mashups
• Supported the process of converting data
to knowledge/insight
• Usually done in an ad hoc manner, e.g.,
“screen scraping”
• Sometimes done with APIs
CITA’15 Workshop,August 2015
Is it possible to do “Data
Programming?”
• Can processes extract from data pools the
same insights that humans do?
• How do humans process collections of
data?
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
• Big Data (and even “not so Big Data”) tends
to be unstructured data (e.g., lists, e-mails,
tweets, etc.)
• Therefore it tends to be “thin” rather than
“thick”
• “Thin” means very little (if any) context -
just data, little knowledge
• What can be added to change data from
“thin” to “thick?”
CITA’15 Workshop,August 2015
9 Steps to Extract Insight
from Unstructured Data (1/2)
1. Make sense of the disparate data sources*
2. Sign off on the method of analytics and find a
clear way to present the results
3. Decide the technology stack for data ingestion
and storage
4. Keep information in a data lake until it has to be
stored in a data warehouse
5. Prepare the data for storage
CITA’15 Workshop,August 2015
9 Steps to Extract Insight
from Unstructured Data (2/2)
6. Retrieve useful information
7. Ontology evaluation*
8. Statistical modeling and execution
9. Obtain insight from the analysis and
visualize it*
CITA’15 Workshop,August 2015
Linked Data is similar to Metadata
but provides Context and Meaning
CITA’15 Workshop,August 2015
Linked Data
• Provides access to the semantics of data
items
• Based upon Semantic Web technologies and
ontologies
• Designed for machines first and humans later
• Degree of structure in descriptions of things
is high
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Linked Data Pros
• Far more “parseable” and “machine
processable” than raw unstructured data
• Enhances data descriptions for complex
analyses
• Can contribute to theVERACITY of our data
• Wide variety of discipline/data ontologies
available
CITA’15 Workshop,August 2015
Linked Data Cons
• Much harder to do than adding keyword
metadata
• Building efficient processing applications
and parsers
• Implementing effective linked data stores
CITA’15 Workshop,August 2015
Linked Open Data
• LOD refers to data stores of Linked Data that are
published (made available online and accessed via URLs)
and free to use
• Open data means it must be available to all without
copyright or ownership
• There is an increasing trend towards “opening”
government data (US and UK, San Francisco and more)
and scientific results
• Provides unprecedented ability to build “mashup”
applications
CITA’15 Workshop,August 2015
A deliberate, conscious, structured
attempt to turn data into
knowledge
CITA’15 Workshop,August 2015
How do we do this?
• By defining unambiguously the relationships
between data items
• By using a shared definition and meaning
mechanism
• By expressing the semantics and syntax
inherent in the data
CITA’15 Workshop,August 2015(Ref: Hlomani & Stacey)
CITA’15 Workshop,August 2015
A lot of new data from
CITA’15 Workshop,August 2015
SSB
(solar system body)
astro:Planet horo:Planet IAU:Planet
CITA’15 Workshop,August 2015
SSB
IAU:Planet
horo:Planet astro:Planet
CITA’15 Workshop,August 2015
Rebirth
Regeneration
Pluto
signifies
signifies prefSymbol
CITA’15 Workshop,August 2015
Methane
Methane Ice
Pluto
madeOf
madeOf prefSymbol
CITA’15 Workshop,August 2015
Pluto
Regeneration Methane
Methane IceRebirth
signifies
signifies madeOf
madeOf
prefSymbol
prefSymbol
CITA’15 Workshop,August 2015
Exercise - What about
this Pluto?
CITA’15 Workshop,August 2015
Fundamental Concepts
(1/2)
• Modeling - making sense of unorganized
information/data
• Formality/Informality - the degree to which
the meaning of a modeling language is given
independent of the particular speaker or
audience
CITA’15 Workshop,August 2015
Fundamental Concepts
(2/2)
• Commonality andVariability - how to
manage things in common and some with
important differences
• Expressivity - the ability of a modeling
language to express maximum variety in
the model
CITA’15 Workshop,August 2015
Tabular Data About Elizabethan
Literature and Music
ID Title Author Medium Year
1
As You
Like It
Shakespeare Play 1599
2 Hamlet Shakespeare Play 1604
3 Othello Shakespeare Play 1603
4
“Sonnet
78”
Shakespeare Poem 1609
CITA’15 Workshop,August 2015
Resource
(subject)
Value
(object)
Property
(predicate)
Subject has a property with value “object” (s,p,o)
Concept of a data triple
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Ontology/Vocabulary
(1/2)
• Provides a common background and
understanding of a particular domain or field
of study, and ensures a common ground
among those who study the information
• A way of organizing concepts, information,
and ideas that is meant to be universal
within the field and allows for a common
language to be spoken
CITA’15 Workshop,August 2015
Ontology/Vocabulary
(2/2)
• A structural framework that allows
concepts to be laid out in a way that makes
sense
• Shows the connections and relationships
between concepts in a manner that is
generally accepted by the field
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Sample Triples
Subject Predicate Object
Row 2 Title Hamlet
Row 2 Year 1604
Row 4 Medium Poem
CITA’15 Workshop,August 2015
Sample Triples
Shakespeare wrote King Lear
Shakespeare wrote Macbeth
Anne Hathaway married Shakespeare
Shakespeare livedIn Stratford
Stratford isIn England
Macbeth setIn Scotland
England partOf UK
Scotland partOf UK
CITA’15 Workshop,August 2015
Shakespeare
AnneHathaway
KingLear
Stratford England
Macbeth Scotland
UK
married
wrote
wrote
livedIn
isIn
setIn
partOf
partOf
CITA’15 Workshop,August 2015
Linked Data Technology
Stack
• URIs - Universal Resource Indicators
(generalization of URL)
• HTTP - HyperText Transport Protocol
• RDF - Resource Description Framework/
Format
• RDFS/OWL - RDF Schema/Web Ontology
Language
CITA’15 Workshop,August 2015
Linked Data Principles
(1/2)
• Use URIs as names of things
• Anything, not just documents
• Information resources and non-information
resources
• Use HTTP URIs
• Globally unique names, distributed ownership
• Allows people to look up those names
CITA’15 Workshop,August 2015
Linked Data Principles
(2/2)
• Provide useful information in RDF
• When someone (or something) looks up
a URI
• Include RDF links to other URIs
• To enable discovery of related
information
CITA’15 Workshop,August 2015
Plays of Shakespeare
with Qnames
Subject Predicate Object
lit:Shakespeare lit:wrote lit:Hamlet
lit:Shakespeare lit:wrote lit:Othello
lit:Shakespeare lit:wrote lit:WintersTale
… … …
CITA’15 Workshop,August 2015
Geographical
Information as Qnames
Subject Predicate Object
geo:Scotland geo:partOf geo:UK
geo:England geo:partOf geo:UK
geo:Wales get:partOf geo:UK
… … …
CITA’15 Workshop,August 2015
Triples Referring to URIs
with aVariety of Namespaces
Subject Predicate Object
lit:Shakespeare lit:wrote lit:Hamlet
bio:AnneHathaway bio:married bio:Shakespeare
geo:Stratford geo:isIn geo:England
geo:England geo:partOf geo:UK
CITA’15 Workshop,August 2015
Reengineering process - data
to data triples (‘triplification’)
Define/acquire
data source
Meta-Model
Define/acquire
mapping description
Apply
reengineering
Data
Source
Data
Source
+
Mapping
RDF
Dataset
CITA’15 Workshop,August 2015
RDF and the Semantic
Web
• Supports the goal of the Semantic Web
• Web information/data should have exact
and unambiguous meaning
• Web information/data can be understood
and processing by computers
• Computers can integrate information/
data from multiple sources on the Web
CITA’15 Workshop,August 2015
What is RDF?
• Resource Description Framework
• Provides a model for data and a syntax so that
independent parties can exchange and use it
• Designed mainly to be read and understood by
computer processors, not humans
• Written in XML
• A W3C Recommendation
• Any XML processor or parser can use
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Basic Ideas Behind RDF
• RDF uses Web identifiers (URIs) to identify
resources
• RDF describes resources with properties
and property values
• Everything is represented as triples
• The essence of RDF is the (s,p,o) triple
CITA’15 Workshop,August 2015
RDF Data Model
• Any expression in RDF is a collection of triples (subject, predicate,
object)
• A set of triples is called an RDF graph
• The nodes of an RDF graph are its subjects and objects
• Direction is important - always points to object
• An assertion of an RDF triple says the relationship (as indicated by
the predicate) holds between subject and object
• The meaning of an RDF graph is conjunction (AND) of the
statements corresponding to all the triples it contains
• RDF does not provide means to express negation (NOT) or
disjunction (OR)
CITA’15 Workshop,August 2015
RDF Design Goal
• Having a simple data model
• Having formal semantics and provable inference
• Using an extensible URI-based vocabulary
• Using an XML-based syntax
• Supporting use of XML Schema datatypes
• Allowing anyone to make statements about any
resource
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Case Study - FOAF
• Friend-of-a-Friend
• A linked data description
of a person
• More than just a blog or
personal Web page
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
http://xmlns.com/foaf/0.1/Person http://www.slac.stanford.edu/~bebo/
http://www.slac.stanford.edu/~bebo/contact.rdf#bebowhite
http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/homepage
http://xmlns.com/foaf/0.1/mbox http://xmlns.com/foaf/0.1/givenname http://xmlns.com/foaf/0.1/family_name
mailto:bebo@slac.stanford.edu Bebo White
CITA’15 Workshop,August 2015
FOAF Generator
http://linkeddatadeveloper.com/Projects/Linked-Data/Sample-Apps/FOAF-Generator/index.xhtml?view
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<Me> a foaf:Person
; foaf:title "Prof"
; foaf:givenName "Bebo"
; foaf:familyName "White"
; foaf:name "Prof Bebo White"
; foaf:mbox_sha1sum "5f6f88bb7e8d15058006a3b03206d356f26eea18"
; foaf:homepage <http://www.bebowhite.com>
; foaf:weblog <>
; foaf:tipJar <>
; foaf:account <>
; foaf:account <>
; foaf:account <>
; foaf:workplaceHomepage <>
; foaf:schoolHomepage <>
; foaf:currentProject <>
; foaf:img <>
; foaf:publications <>
; foaf:made <>
.
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Semantic Mashups
• A mashup application using Semantic Web
technologies inside
• Supplements Web 2.0 mashups by adding
access to semantic data sources
• Can be either client-side or server-side
CITA’15 Workshop,August 2015
Data integration architecture
for Linked Data mashups
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Case Study: PingER
• PingER (Ping End-to-end Reporting)
• Uses the Internet ping facility to monitor performance of
Internet links worldwide
• Measures
• Short and long term RTT
• Packet loss percentages
• Jitter
• Lack of reachability (no response to ping)
• Throughput and quality of IP telephony (VoIP)
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
PingER Monitor Node
Format (original)
Monitor
Host
Nme
Monitor
Address
Remote
Name
Remote
Address
Bytes Time Xmt Rcv Min Avg Max
minos.slac.
stanford.edu
134.79.196.
100
www.lbl.gov 128.3.7.14 100 870393602 10 10 6 18 125
CITA’15 Workshop,August 2015
• Bytes - can be 100 or 1000 (min 100); number of bytes in
each ping packet
• Time - Unix Epochal time and is GMT (UDT)
• Xmt - number of ping packets sent
• Rcv - number of ping packets received
• Min - minimum response time for packets sent (in
milliseconds)
• Avg - average response time for packets sent (in milliseconds)
• Max - maximum response time for packets sent (in
milliseconds)
CITA’15 Workshop,August 2015
PingER Monitor Node
Format (revised)
• Same as original plus
• for each ping response the Sequence
number is recorded
• for each ping the RTT (round trip time) is
recorded
CITA’15 Workshop,August 2015
PingER Rules
• There should always be >7 tokens in the
line
• If <=7 tokens, site considered unreachable
• If no response to the pings are received,
only 8 tokens and Rcv (the 8th token) will
be 0
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
Possible Uses of PingER
Data
• Technical
• Economical
• Troubleshooting
• Collaboration
• Quantifying the impact of events
• Routing
CITA’15 Workshop,August 2015
Workshop Exercise
• Given a table of (unstructured) data
• Produce an RDF graph that reflects the
content in such a way that the information
intent is preserved but the data is now
available for RDF operations such as merging
with other linked datasets and RDF query
• Think of new applications that this
“triplification” might add to the use of PingER
data and what parties might be interested
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
(Ref: Measurement Ontology for IP traffic (MOI), European Telecommunications Union)
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
PingER LOD dataset
sizing
If 1 measurement = 235 bytes data
Total triplified datastore = (approx.)
235* #Measurements =
736,064,229,585 bytes = (approx.)
685.5 GB
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
CITA’15 Workshop,August 2015
What Did We Do?
Data in various formats
Data represented in
abstract format
Applications
Manipulate
Query
…
Map, Expose….
CITA’15 Workshop, August 2015
ThankYou!
Questions? Comments?
bebo@slac.stanford.edu

More Related Content

What's hot

Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
DataKitchen
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
Ry Walker
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
Lars Albertsson
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
DataKitchen
 
Bigowl aitech
Bigowl aitechBigowl aitech
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
Rob Winters
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
Caserta
 
Overcoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in ProductionOvercoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in Production
Sandeep Uttamchandani
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
DataKitchen
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Rehgan Avon
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Think Big, a Teradata Company
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
Rob Winters
 
Data democratised
Data democratisedData democratised
Data democratised
Lars Albertsson
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
DataKitchen
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
Sri Ambati
 
Data Modeling for Big Data & NoSQL Technologies with Karen Lopez
Data Modeling for Big Data & NoSQL Technologies with Karen LopezData Modeling for Big Data & NoSQL Technologies with Karen Lopez
Data Modeling for Big Data & NoSQL Technologies with Karen Lopez
Embarcadero Technologies
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
Inside Analysis
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
Pierre Gutierrez
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
 
Bigowl aitech
Bigowl aitechBigowl aitech
Bigowl aitech
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Overcoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in ProductionOvercoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in Production
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 
Data democratised
Data democratisedData democratised
Data democratised
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Data Modeling for Big Data & NoSQL Technologies with Karen Lopez
Data Modeling for Big Data & NoSQL Technologies with Karen LopezData Modeling for Big Data & NoSQL Technologies with Karen Lopez
Data Modeling for Big Data & NoSQL Technologies with Karen Lopez
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 

Similar to Workshop_CITA2015

Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
Marin Dimitrov
 
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS PlatformALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
Seonho Kim
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
Benjamin Bengfort
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo
 
Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
RajendraKankrale1
 
Store, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged ConferenceStore, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged Conference
Ani Lopez
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation data
Rob Worthington
 
Using text analytics to manage mobile qual to manage mobile Qual Data - Civicom
Using text analytics to manage mobile qual to manage mobile Qual Data - CivicomUsing text analytics to manage mobile qual to manage mobile Qual Data - Civicom
Using text analytics to manage mobile qual to manage mobile Qual Data - Civicom
Merlien Institute
 
Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014
Andy Kriebel
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2
Arun K
 
2015_May Dotmatics UGM Bob Coner
2015_May Dotmatics UGM Bob Coner2015_May Dotmatics UGM Bob Coner
2015_May Dotmatics UGM Bob Coner
Bob Coner
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
Aravindharamanan S
 
Cincinnati Tableau User Group Event #3
Cincinnati Tableau User Group Event #3Cincinnati Tableau User Group Event #3
Cincinnati Tableau User Group Event #3
Russell Spangler
 
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Software
 
How to be data savvy manager
How to be data savvy managerHow to be data savvy manager
How to be data savvy manager
TOSHI STATS Co.,Ltd.
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
Semantic Web Company
 
Let’s All Agree on What We’re Counting and How
Let’s All Agree on What We’re Counting and HowLet’s All Agree on What We’re Counting and How
Let’s All Agree on What We’re Counting and How
National Information Standards Organization (NISO)
 

Similar to Workshop_CITA2015 (20)

Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
 
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS PlatformALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
 
Store, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged ConferenceStore, Extract, Transform, Load, Visualize. Untagged Conference
Store, Extract, Transform, Load, Visualize. Untagged Conference
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation data
 
Using text analytics to manage mobile qual to manage mobile Qual Data - Civicom
Using text analytics to manage mobile qual to manage mobile Qual Data - CivicomUsing text analytics to manage mobile qual to manage mobile Qual Data - Civicom
Using text analytics to manage mobile qual to manage mobile Qual Data - Civicom
 
Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2
 
2015_May Dotmatics UGM Bob Coner
2015_May Dotmatics UGM Bob Coner2015_May Dotmatics UGM Bob Coner
2015_May Dotmatics UGM Bob Coner
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Cincinnati Tableau User Group Event #3
Cincinnati Tableau User Group Event #3Cincinnati Tableau User Group Event #3
Cincinnati Tableau User Group Event #3
 
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic culture
 
How to be data savvy manager
How to be data savvy managerHow to be data savvy manager
How to be data savvy manager
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Let’s All Agree on What We’re Counting and How
Let’s All Agree on What We’re Counting and HowLet’s All Agree on What We’re Counting and How
Let’s All Agree on What We’re Counting and How
 

Workshop_CITA2015