Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

1,365 views

Published on

Published in: Data & Analytics
  • Be the first to comment

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

  1. 1. Agenda & Intervenants
  2. 2. Introduction
  3. 3. Alter Way in 2 slides
  4. 4. Alter Way in 2 slides
  5. 5. Elasticsearch in 1 slide • More than 11 million downloads • 650,000 New Downloads per Month • 1000s of Mission Critical Implementations • Top Investors: Benchmark Capital, Index Ventures • Seasoned Executive Team – Founded by Creator of Elasticsearch – Seasoned Executives from SpringSource
  6. 6. Les enjeux de la recherche à l’ère du BigData
  7. 7. Big Data in Todayʼs Business and Technology Environment : some significant figures • 2.7 Zetabytes of data exist in the digital universe today. (=1 billion Terabytes) • 235 Terabytes of data has been collected by the U.S. Library of Congress in April 2011. • Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data. • Akamai analyzes 75 million events per day to better target advertisements. • Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data. • The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&Tʼ’s extensive calling records. • Hadoop : – 94% of Hadoop users perform analytics on large volumes of data not possible before – 88% analyze data in greater detail; – while 82% can now retain more of their data.
  8. 8. The Rapid Growth of Unstructured Data • YouTube users upload 48 hours of new video every minute of the day. • 500+ new websites are created every minute of the day. • Brands and organizations on Facebook receive 34,722 Likes every minute of the day. • 100 terabytes of data uploaded daily to Facebook. • According to Twitterʼ’s own research in early 2012, it sees roughly 175 million tweets every day, and has more than 465 million accounts. • 30 Billion pieces of content shared on Facebook every month. Data production will be 44 times greater in 2020 than it was in 2009.
  9. 9. Big Data & Real Business Issues • 25+ % of decision‐makers surveyed predict that data volumes in their companies will rise by more than 60% by the end of 2014, with the average of all respondents anticipating a growth of no less than 42 %. • 40% projected growth in global data generated per year vs. 5% growth in global IT spending. • According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years. – Poor data can cost businesses 20%–35% of their operating revenue. – Bad data or poor data quality costs US businesses $600 billion annually. • 75+ % of decision-makers surveyed anticipate significant impacts in the domain of storage systems as a result of the “Big Data” phenomenon. • We anticipate a new challenge : to be able to Search and Analyse all those datas … in real time !
  10. 10. Elasticsearch A solution already in production with significant french implementations Revolutionizing Data Search and Analytics Richard Maurer– SEMEA Territory Manager
  11. 11. Purpose of Elasticsearch • Organize data and make it easily accessible – Through powerful search and analytics – Easily consumable (even for non-data scientists) – Elegantly handles extremely large data volumes – Delivers results in real time • Technology stack agnostic • Used across all market verticals
  12. 12. Features of Elasticsearch • Structured & unstructured search • Advanced analytics capabilities • Unmatched performance • Real-time results • Highly scalable • User friendly installation and maintenance
  13. 13. Elasticsearch 1.4: a solution production ready • Real time data Indexation • Distributed • High Availability • Schema Free • Real Time Data Analytics • Multi Tenancy • Much more….
  14. 14. Unprecedented Uptake Elasticsearch has more than11 Million downloads … and 650,000 more each month Cumulative
  15. 15. French Users
  16. 16. French Use Cases Bouygues Telecom: Uses Elasticsearch in their Big Data Platform. Cut their web resolution time by 10X Daily Motion: Indexing their 20 million Videos on Elasticsearch. On production for over 2 years Voyages SNCF They have recently announced ES has being live on their “Usine Logicielle” Fotolia: Search Engine made on Elasticsearch, to access 24 Million Images, move over to ES Orange: With over 1.2 billion docs, looking at better solution and cost reduction
  17. 17. Product Offerings: Support Throughout Your Project 1. Core Elasticsearch Training (2 days) 2. ELK Workshop (1 day) 3. Development and Production Support 4. Marvel, Monitoring of your ES clusters
  18. 18. 2: Support
  19. 19. Resources • www.elasticsearch.com • www.elasticsearch.org • User Groups: http://www.elasticsearch.org/community/forum/ • Contact: Richard Maurer Territory Manager Richard.maurer@elasticsearch.com
  20. 20. MAKE SENSE OF YOUR (BIG) DATA! David Pilato Technical advocate! ! elasticsearch. @dadoonet
  21. 21. StartUp data ?
  22. 22. StartUp
  23. 23. StartUp
  24. 24. StartUp
  25. 25. StartUp
  26. 26. StartUp
  27. 27. StartUp BIG data ?
  28. 28. StartUp BIG data ?
  29. 29. 35.000.000.000.000.000 mb Source: http://www.csc.com/insights/flxwd/78931-big_data_just_beginning_to_explode StartUp
  30. 30. StartUp Source: http://www.domo.com/learn/data-never-sleeps-2
  31. 31. StartUp search = like % ? SELECT ! doc.*, country.* ! FROM ! doc, country! WHERE ! doc.country_code = country.code AND! doc.date_doc > to_date('2011-12', 'yyyy-mm') AND ! doc.date_doc < to_date('2012-01', 'yyyy-mm') AND ! lower(country.name) = 'france' AND ! lower(doc.comment) LIKE ‘%product%' AND lower(doc.comment) LIKE ‘%david%';
  32. 32. StartUp Search engine ?
  33. 33. elasticsearch ? StartUp plug & play REST/JSON scalable Apache 2 license Lucene elasticsearch
  34. 34. Start… $ wget https://download.elasticsearch.org/elasticsearch/ elasticsearch/elasticsearch-1.1.1.tar.gz! $ tar -xf elasticsearch-1.1.1.tar.gz! $ ./elasticsearch-1.1.1/bin/elasticsearch! [INFO ][node ][Ghost Maker] {1.1.1}[5645]: initializing
  35. 35. … and play! $ curl -XPUT localhost:9200/sessions/session/1 -d '{! "title" : "Elasticsearch",! "subtitle" : "Make sense of your (BIG) data !",! "date" : "2014-05-20T10:30:00",! "tags" : [ "elasticsearch", "alterway", "bigdata" ],! "speakers" : [{! "first_name" : "David", ! "last_name" : "Pilato" ! }]! }'
  36. 36. Search! $ curl http://localhost:9200/sessions/session/_search -d' { "query": { "multi_match": { "query": "elasticsearch alterway david", "fields": [ "title^3", "tags^2", "speakers.first_name" ] } }, "post_filter": { "range": { "date": { "from": "2014-05-01", "to": "2014-06" } } } }'
  37. 37. StartUp Compute?
  38. 38. Compute! $ curl http://localhost:9200/sessions/session/_search -d' { "query": { ... }, "aggs": { "by_date": { "date_histogram": { "field": "date", "interval": "day", "format" : "dd/MM/yyyy" } } } }' "by_date": [ { "key_as_string": "03/04/2014", "doc_count": 1 }, { "key_as_string": "12/04/2014", "doc_count": 2 }, { "key_as_string": "16/04/2014", "doc_count": 3 } ]
  39. 39. Let’s make sense of … • logs! • twitter! • github! • marketing data! • ...! • your data! • your big data #mstechdays #elasticsearch StartUp
  40. 40. Let’s make sense of … • logs! • twitter! • github! • marketing data! • ...! • your data! • your big data { "name":"Pilato David", "dateOfBirth":"1971-12-26", "gender":"male", "children":3, "marketing":{ "fashion":334, "music":3363, "hifi":2351 }, "address":{ "country":"France", "city":"Paris", "location": [2.332395, 48.861871] } } #mstechdays #elasticsearch StartUp
  41. 41. démo MAKE SENSE OF YOUR (BIG) DATA! let’s inject some marketing documents… #mstechdays #elasticsearch StartUp
  42. 42. elasticsearch. elasticsearch kibana logstash Marvel
  43. 43. thanks @dadoonet
  44. 44. Comment insérer ElasticSearch dans votre Système d’Information et en tirer le meilleur parti
  45. 45. ElasticSearch to do What ?
  46. 46. STORE
  47. 47. SEARCH
  48. 48. ANALYZE
  49. 49. Are you ready to use ElasticSearch in your IT?
  50. 50. What you need to run it • Java 8 update 20 or later, or Java 7 update 55 or later • Only Oracle’s Java and the OpenJDK are supported.
  51. 51. Github projects • Many projects • Big activity • Many languages 6 mois !
  52. 52. Clients
  53. 53. Scripting Plugins Language
  54. 54. Why it ‘s easy
  55. 55. • One to many • ~ Zero conf • Cloud oriented • Scalability DNA • Replication • Sharding • Distributed • Resilience • Snapshot • Restore Start Small Grow Big
  56. 56. • One to many • ~ Zero conf • Cloud oriented • Scalability DNA • Replication • Sharding • Distributed • Resilience • Snapshot • Restore Start Small Grow Big
  57. 57. Where / How can you use ElasticSearch?
  58. 58. Centralized Log Storage 1/2 VIA
  59. 59. Centralized Log Storage 2/2
  60. 60. … CMS Search Engine
  61. 61. Ecommerce Enhanced Search Engine • Faceting • Fuzzy Search • Speed • Auto Completion • Geo Search • Log Analysis
  62. 62. Combining Hadoop & ElasticSearch • REST based • Memory and I/O efficient • Adaptive I/O • Map/Reduce API support • Pig support • Hive support  elasticsearch-hadoop
  63. 63. What Else ?
  64. 64. It’s up to you to decide what to build with ES
  65. 65. Analysis / Dasboards Some Examples
  66. 66. Kibana examples : IRC Activity
  67. 67. Kibana examples : Pfsense Monitoring
  68. 68. Kibana examples : Windows Events
  69. 69. Kibana examples : Inventory
  70. 70. Kibana examples : Syslog
  71. 71. Kibana examples : Web Activity
  72. 72. ES = No Limits
  73. 73. Conclusion
  74. 74. Conclusion • Il est temps de révolutionner la façon dont vous valorisez vos données : offrez Elasticsearch à vos applicatifs ! • La stack ELK (Elasticsearch, Logstash, Kibana) est déjà massivement utilisée en production ! • Faites vous accompagner pour bénéficier des bonnes pratiques et du support à tous les stades de votre projet : conception, développement, production
  75. 75. Questions / Réponses

×