The document discusses using Elasticsearch, D3.js, Angular.js, and Google Refine to create a full stack data visualization of open data from Bordeaux, France. It focuses on data from the CAPC contemporary art museum, importing the data into Elasticsearch for scalable search and then using D3.js, Angular.js, and Yeoman to build the front-end visualization with JavaScript. The goal is to make the data more accessible and understandable through interactive visualization.
Data vizualisation: d3.js + sinatra + elasticsearchMathieu Elie
Live screencast on my tech blog (fr speaking):
http://www.mathieu-elie.net/screencast-video-d3-js-sinatra-elasticsearch-capucine/
other tech slides at my blog: http://www.mathieu-elie.net
Start guide to web scraping with Scrapy, one of best python modules to do web scraping, with Scrapy everything is more easy.
This presentation covers the key concepts of scrapy and the process of criation of spiders.
It's the first draft version and will be other versions, until the last version, if you see something that you want to be improved, give feedback and I will take that in consideration.
I also talk about some alternatives to scrapy like lxml, newspapers and others.
In the final i give you acess to the code used on this presentation, so you cant test easy and fast the concepts talked on this presentation.
I hope you like it :D
Appli légère avec d3.js, sinatra, elasticsearch et capucineyann ARMAND
Présentation lors de l'Apéro Ruby Bordelais du 6 Mars 2012
par Mathieu Elie.
twitter: @mathieuel
http://www.mathieu-elie.net
Apéro Ruby Bordeaux
==============
twitter: @rubybdx
htp://rubybdx.org
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
Data vizualisation: d3.js + sinatra + elasticsearchMathieu Elie
Live screencast on my tech blog (fr speaking):
http://www.mathieu-elie.net/screencast-video-d3-js-sinatra-elasticsearch-capucine/
other tech slides at my blog: http://www.mathieu-elie.net
Start guide to web scraping with Scrapy, one of best python modules to do web scraping, with Scrapy everything is more easy.
This presentation covers the key concepts of scrapy and the process of criation of spiders.
It's the first draft version and will be other versions, until the last version, if you see something that you want to be improved, give feedback and I will take that in consideration.
I also talk about some alternatives to scrapy like lxml, newspapers and others.
In the final i give you acess to the code used on this presentation, so you cant test easy and fast the concepts talked on this presentation.
I hope you like it :D
Appli légère avec d3.js, sinatra, elasticsearch et capucineyann ARMAND
Présentation lors de l'Apéro Ruby Bordelais du 6 Mars 2012
par Mathieu Elie.
twitter: @mathieuel
http://www.mathieu-elie.net
Apéro Ruby Bordeaux
==============
twitter: @rubybdx
htp://rubybdx.org
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
On Again; Off Again - Benjamin Young - ebookcraft 2017BookNet Canada
Over the past year, the world’s leading browsers have added features enabling web applications and publications to “phase shift” between online and offline states. Surrounding these new features is a growing set of communities broadly coalescing around the term “offline-first.” In this talk, we’ll take a look at some of the key bits of technology being used by these new phase-shifting applications, as well as how the W3C’s Digital Publishing Interest Group is exploring these (and other ideas) for possible use in Portable Web Publications.
March 23, 2017
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...BookNet Canada
BookNet’s BiblioShare database now holds over 2 million public records and counting – so what are we doing with all that bibliographic data? Or better yet: what aren’t we doing? Join Tim as he demonstrates a few in-progress tools and blue-sky possibilities that put all that data to good use.
Web History 101, or How the Future is UnwrittenBookNet Canada
In 1989 computer scientist Tim Berners-Lee wrote “Information Management: A Proposal” to persuade CERN management that a global hypertext system was in their interests. That proposal gradually grew into what we now call the World Wide Web. This originating document contains not only the bits that would later become the Web, but also features for a future we’ve yet to realize. In this talk, we’ll take a look at some of those highlights and focus them on the world of publishing, proposing solutions to problems we’re still attempting to solve and fostering ideas for further daydreaming.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
FIFA fails, Guy Kawasaki and real estate in SF - find out about all three by ...Elżbieta Bednarek
How to use Object Path, the agile query languge, to effectively extract relevant data from JSON documents of complex or even unknown structure. How to quickly build a web app using the insights you discover with ObjectPath.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
On Again; Off Again - Benjamin Young - ebookcraft 2017BookNet Canada
Over the past year, the world’s leading browsers have added features enabling web applications and publications to “phase shift” between online and offline states. Surrounding these new features is a growing set of communities broadly coalescing around the term “offline-first.” In this talk, we’ll take a look at some of the key bits of technology being used by these new phase-shifting applications, as well as how the W3C’s Digital Publishing Interest Group is exploring these (and other ideas) for possible use in Portable Web Publications.
March 23, 2017
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...BookNet Canada
BookNet’s BiblioShare database now holds over 2 million public records and counting – so what are we doing with all that bibliographic data? Or better yet: what aren’t we doing? Join Tim as he demonstrates a few in-progress tools and blue-sky possibilities that put all that data to good use.
Web History 101, or How the Future is UnwrittenBookNet Canada
In 1989 computer scientist Tim Berners-Lee wrote “Information Management: A Proposal” to persuade CERN management that a global hypertext system was in their interests. That proposal gradually grew into what we now call the World Wide Web. This originating document contains not only the bits that would later become the Web, but also features for a future we’ve yet to realize. In this talk, we’ll take a look at some of those highlights and focus them on the world of publishing, proposing solutions to problems we’re still attempting to solve and fostering ideas for further daydreaming.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
FIFA fails, Guy Kawasaki and real estate in SF - find out about all three by ...Elżbieta Bednarek
How to use Object Path, the agile query languge, to effectively extract relevant data from JSON documents of complex or even unknown structure. How to quickly build a web app using the insights you discover with ObjectPath.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
We live in an era where the world is more connected than ever before and the trajectory is such that data relationships will only continue to increase with no signs of slowing down.
Connected data is the key to your business succeeding and growing in today’s connected world.
Leading enterprises will be the ones that utilize relationship-centric technologies to leverage connections from their internal operations and supply chain to their customer and user interactions. This ability to utilize connected data to understand all the nuanced relationships within their organization will propel them forward as they act on more holistic insights.
Every organization needs a knowledge graph because connected data is an essential foundation to advancing business. Knowledge graphs provide:
- Increased visibility between internal groups
- Efficiency gains
- Cross-functional data collaboration
- Core complete and reliable business insights
- Better customer engagement
The live presentation and discussion can be found here: https://youtu.be/7vBdlXzhs_4
Additional reading on why connected data is beneficial: https://www.graphgrid.com/why-connected-data-is-more-useful/
Connected data solutions available by Benjamin and his team via GraphGrid and AtomRain: https://www.graphgrid.com and https://www.atomrain.com
Due to recent advances in technology, humanity is collecting vast amounts of data at an unprecedented rate, making the skills necessary to mine insights from this data increasingly valuable. So what does it take for a Developer to enter the world of data science?
Join me on a journey into the world of big data and machine learning where we will explore what the work actually looks like, identify which skills are most important, and design a road map for how you too can join this exciting and profitable industry.
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichPatrick Baumgartner
Once you begin developing with NoSQL technologies you will quickly realize that accessing data stores or services often requires in-depth knowledge of proprietary APIs that are typically not designed for use in enterprise Java applications. Sooner or later you might find yourself wanting to write an abstraction layer to encapsulate those APIs and simplify your application code. Luckily such an abstraction layer already exits: Spring Data.
Pre-Aggregated Analytics And Social Feeds Using MongoDBRackspace
Jon Hyman, co-founder and CIO of Appboy, an engagement platform for mobile apps, highlights how it solved issues around pre-aggregated analytics and used statistical formulas on top of the aggregation framework to return results in real time as its data grew. And Greg Avola, co-founder and developer at Untappd, a social network for beer lovers, discussed how MongoDB and ObjectRocket helped Untappd address problems with serving its social feed and how it sustained high performance up to 5,000 to 6,000 queries per second, and used location indexes to enable geo-location search.
Data Science at Scale - The DevOps ApproachMihai Criveti
DevOps Practices for Data Scientists and Engineers
1 Data Science Landscape
2 Process and Flow
3 The Data
4 Data Science Toolkit
5 Cloud Computing Solutions
6 The rise of DevOps
7 Reusable Assets and Practices
8 Skills Development
This talk evalutes some easy ways to extract useful trending and capacity planning out of your existing monitoring investment. Using Nagios performance data, we examine simple behaviors with PNP4Nagios and graduate on to more insightful analytics with Graphite. With metrics in hand we look at the questions that IT /should/ be asking, such as:
* What sort of data should I trend?
* Why do I need to trend it?
* How do Operational or Engineering trends relate to Business or Transactional monitoring?
* How does this data impact our customer relationship and/or their bottom-line?
Finally, we look at creative ways to get profiling data out of your production systems with a minimum amount of effort from your development team.
1. dataviz on
bordeaux open data
+ elasticsearch
+ d3js
+ angular.js + google refine
mardi 19 mars 13
2. goal
• full stack dataviz
• front js centric with angular.js and sass
• d3.js -> best lib for dataviz (a bit complex)
• elasticsearch scalable search engine access
form js ajax client
• grab open data and reformat with google
refine
mardi 19 mars 13
3. philosophy
• smarter and smarter browsers will grab a
huge part of the stack
• scalable NoSQL solution talk REST: super
easy access
• you always can enrich, refine, store, model
data from anysource: yes you can !
mardi 19 mars 13
4. go open data !
• go to http://opendata.bordeaux.fr/
• we focus ou capc contemporary museum
of bordeaux
• http://opendata.bordeaux.fr/content/
collections-du-capc-musee-dart-
contemporain
• curl https://
bdxconfigogdi.blob.core.windows.net/
converteddata/capc.csv -o capc.csv
mardi 19 mars 13
5. google refine
• data many times isn’t perfect: humans
input, unsuitable model...
• with google refine you can tidy data, enrich
with web services call and so on...
• i think you should never think you don’t
have the good data for the job
• be smart and be creativ: you have
everything you need, all the time !
mardi 19 mars 13
6. import data in
elasticsearch
• bulk api import for es is handy
• http://www.elasticsearch.org/guide/
reference/api/bulk.html
• we use templating in google refine to
export data to the correct format
• { "index" : { "_index" : "musees", "_type" :
"capc"} } [snipp]
mardi 19 mars 13
7. elasticsearch
• scalable search engine
• adding more power == adding more nodes
• sharding
• replication
• fault tolerant
mardi 19 mars 13
8. elasticsearch
• store unstructured document (json) in
indexes (NoSQL way)
• talk REST (api)
• advanced query langage
• multiple analyzers (tokens, languages, etc...)
• blazing fast !
• no alternativ solutions (to my advice) (and
kimshy advice too ;) )
mardi 19 mars 13
9. yeoman
• perfect tool for the new browser focus
stack coming soon
• yeoman init angular
• yeoman init angular:route capc
• yeoman install d3
• yeoman install jquery
• yeoman server -> yeah !!!
mardi 19 mars 13
10. angular.js
• mvc framework for browser (js)
• by google
• rely better on html doc
• cleaner controller handling than backbone
(to my advice)
mardi 19 mars 13
11. d3.js
• best javascript lib for dataviz (to my advice)
• slow learning curve
• based on svg browser markup
• data and dom oriented
• generic (go low level with svg easily)
mardi 19 mars 13
12. dataviz
• make appears informations from pure data
• you should explore, analyse and be creativ
to grab the most value possible from the
data: go as far as you can
• tables (excell way) -> forgive
• visualization will reveal your data: data is
made to be know to take decision, manage,
understand
mardi 19 mars 13
13. open data
• today, wide range of datas come to the
public domain
• again, without computing and vizualisation,
data has a poor value
• open data without data scientists and data
visualization has no futur
mardi 19 mars 13
14. futur
• with low bandwith, storage and computing
cost, you can grab open data from a lot of
sources
• you can put them in big data store and
make analysis, relations between, with open
source technologies
• you can then share this to the word with
data visualization on your website, blogs...
mardi 19 mars 13
• amazing isn’t it ?? !!!!!