Personal Information
Organization / Workplace
Ithaca, New York Area United States
Occupation
Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes
Industry
Technology / Software / Internet
Website
http://gen5.info/q/
About
For six years, my company has been focused on building commercial consumer-facing systems based on Linked data sources such as Freebase, DBpedia and Wikidata. I created :BaseKB, the first correct conversion of Freebase to RDF, which ensures that Freebase data will live on after Google shutters the service.
From this experience, we've methods for matching (i) syntax and schemas and (ii) instance data (specific things such as people, places, and legal entities) that use expressive business rules running inside a scalable fabric such as Spark or Hadoop to rapidly understand and clean up data from "data lakes" and other large collections. This technology also applies to communication...
Tags
rdf
semantic web
linked data
programming
c#
rich internet application
silverlight
freebase
asynchronous
javascript
software engineering
php
physics
java
search
ir
deterministic chaos
classical chaos
solid state physics
big data
dbpedia
software development
data lakes
reference data
lei
master data management
artificial intelligence
enterprise search
.net
dba
ajax
flex
sql
sales management
information retrieval
patents
tools
mysql
commonspot
oracle database
quantum mechanics
quantum chaos
spin systems
open access
amazon web services
jena
apache hadoop
mapreduce
product management
visual
user experience
ibm
value
blockchain
lei reference data bitemporal analysis big data
big data smart data cisco fog iot
reasoning
business rules
inference
neo4j
graph databases
schemas
hadoop ibm watson big data matching semantics
corporate entities
data quality
legal entity identifier
owl
finance
fibo
aws
software
rdfeasy
aws marketplace
superman
taxonomy
ontology
comic books
lucene
relevance
enterprise software
ce
statistics
wikipedia
hashtables
maps
dictionaries
extension methods
sql server
stored procedures
constraints
microsoft sql server
microsoft
casting
microsoft sql
stored procedure
basekb
resume
hadoop
paul houle
callbacks
flash
dynamic
ria
gwt
error handling
exceptions
namespaces
methods
object orientation
nested sets
sets
trees
star schema
java server pages
jsp
xml
gis
sales force alignment
deep learning
neural network
work
neural networks
business strategy
speed
business process
non-functional requirements
eclipse
software management
user management
web applications
digital library
postgresql
metadata
glopad
text classifcation. time series
arxiv
documentation
apache httpd
web server
cfmx
http
proxy web applications
catalog
tutoria
opac
instruction
accessability
ltef
www
usability
black swan
power law
nonlinearity
acoustic emission
fractals
anharmonic localization
support vector machine
academic publishing
time series analysis
machine learning
data science
resource description framework
image
creative commons
See more
Presentations
(21)Documents
(19)Likes
(5)On Beyond OWL: challenges for ontologies on the Web
James Hendler
•
8 years ago
The Global Performing Arts Database
Paul Houle
•
10 years ago
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
Stephen Buxton
•
10 years ago
Parallel Data Processing with MapReduce: A Survey
Kyong-Ha Lee
•
12 years ago
Mapreduce Algorithms
Amund Tveit
•
11 years ago
Personal Information
Organization / Workplace
Ithaca, New York Area United States
Occupation
Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes
Industry
Technology / Software / Internet
Website
http://gen5.info/q/
About
For six years, my company has been focused on building commercial consumer-facing systems based on Linked data sources such as Freebase, DBpedia and Wikidata. I created :BaseKB, the first correct conversion of Freebase to RDF, which ensures that Freebase data will live on after Google shutters the service.
From this experience, we've methods for matching (i) syntax and schemas and (ii) instance data (specific things such as people, places, and legal entities) that use expressive business rules running inside a scalable fabric such as Spark or Hadoop to rapidly understand and clean up data from "data lakes" and other large collections. This technology also applies to communication...
Tags
rdf
semantic web
linked data
programming
c#
rich internet application
silverlight
freebase
asynchronous
javascript
software engineering
php
physics
java
search
ir
deterministic chaos
classical chaos
solid state physics
big data
dbpedia
software development
data lakes
reference data
lei
master data management
artificial intelligence
enterprise search
.net
dba
ajax
flex
sql
sales management
information retrieval
patents
tools
mysql
commonspot
oracle database
quantum mechanics
quantum chaos
spin systems
open access
amazon web services
jena
apache hadoop
mapreduce
product management
visual
user experience
ibm
value
blockchain
lei reference data bitemporal analysis big data
big data smart data cisco fog iot
reasoning
business rules
inference
neo4j
graph databases
schemas
hadoop ibm watson big data matching semantics
corporate entities
data quality
legal entity identifier
owl
finance
fibo
aws
software
rdfeasy
aws marketplace
superman
taxonomy
ontology
comic books
lucene
relevance
enterprise software
ce
statistics
wikipedia
hashtables
maps
dictionaries
extension methods
sql server
stored procedures
constraints
microsoft sql server
microsoft
casting
microsoft sql
stored procedure
basekb
resume
hadoop
paul houle
callbacks
flash
dynamic
ria
gwt
error handling
exceptions
namespaces
methods
object orientation
nested sets
sets
trees
star schema
java server pages
jsp
xml
gis
sales force alignment
deep learning
neural network
work
neural networks
business strategy
speed
business process
non-functional requirements
eclipse
software management
user management
web applications
digital library
postgresql
metadata
glopad
text classifcation. time series
arxiv
documentation
apache httpd
web server
cfmx
http
proxy web applications
catalog
tutoria
opac
instruction
accessability
ltef
www
usability
black swan
power law
nonlinearity
acoustic emission
fractals
anharmonic localization
support vector machine
academic publishing
time series analysis
machine learning
data science
resource description framework
image
creative commons
See more