The document discusses Hortonworks and its strategy to support Apache Hadoop. Hortonworks aims to make Hadoop easy to use and deployable at enterprise scale. It offers the Hortonworks Data Platform, training, support subscriptions, and consulting services to help organizations adopt Hadoop. Hortonworks' goal is to establish Hadoop as the next-generation data platform and help more of the world's data be processed using Apache Hadoop.
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
Matt Comstock, Vice President Business Intelligence Office, Razorfish, presents at the Big Analytics 2012 Roadshow.
From search to email to social, customers are interacting with your brand across a variety of channels. But what do people do once they view an advertisement or get an email? What common behaviors are displayed once they’re on your site? By combining media exposure/behavior, site-side media, and in-store purchase data, you can understand better the impact media has on driving value to your business. Come to this session to learn how better data-driven multi-channel analysis lets you see what consumers do before they become a customer to understand what content influences which segments of users by media audience. Discover new segmentation and targeting strategies to improve engagement with your brand and increase advertising lift. See how a leader in digital marketing uses a combination of technologies including Teradata Aster, Hadoop, and Amazon Web Services to handle big data and provide big analytics to improve business value.
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
Matt Comstock, Vice President Business Intelligence Office, Razorfish, presents at the Big Analytics 2012 Roadshow.
From search to email to social, customers are interacting with your brand across a variety of channels. But what do people do once they view an advertisement or get an email? What common behaviors are displayed once they’re on your site? By combining media exposure/behavior, site-side media, and in-store purchase data, you can understand better the impact media has on driving value to your business. Come to this session to learn how better data-driven multi-channel analysis lets you see what consumers do before they become a customer to understand what content influences which segments of users by media audience. Discover new segmentation and targeting strategies to improve engagement with your brand and increase advertising lift. See how a leader in digital marketing uses a combination of technologies including Teradata Aster, Hadoop, and Amazon Web Services to handle big data and provide big analytics to improve business value.
Introduccion a SQL Server Master Data ServicesEduardo Castro
En esta presentación hacemos una introducción a SQL Server 2008 R2 Master Data Services.
Saludos,
Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica
Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server
http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/
This session covers a brief introduction about Fusion Applications and the session progresses into the discussion of some of the highlights of the Fusion MDM for Customer application.
To Each Their Own: How to Solve Analytic ComplexityInside Analysis
The Briefing Room with Shawn Rogers and Noetix
Slides from the Live Webcast on Aug. 14, 2012
One size will never fit all in the complex world of information management. In fact, the variety of information systems in use continues to expand. That includes all kinds of systems: data-producing applications, data-processing apps, and the downstream tools used for reporting and analytics. How can data-savvy organizations stay ahead of the curve?
Check out this episode of The Briefing Room to learn from Analyst Shawn Rogers of Enterprise Management Associates, who will explain how effective use of standard data models can solve the complexity of increasingly heterogeneous information architectures. Rogers will be briefed by Daryl Orts of Noetix who will tout his company’s wide range of industry and application-specific data models which can be used to satisfy the particular needs of today’s diverse user community.
For more information, visit: http://www.insideanalysis.com
The Business Data Catalogue provides a method of integrating business data from back-end server applications, such as SAP or Siebel or other line of business applications, into Microsoft Office SharePoint Server 2007, without writing any code. Business Intelligence with Office SharePoint Server 2007 provides a framework for accessing that data, and for providing a toolset for business decision makers to turn the raw data into critical business information.
This presentation is designed to provide the audience with an overview of how BI can be used to create visual dashboards that assemble and display business information from multiple sources (e.g. Excel Services, SQL Reporting) using built-in web parts.
This is a business session and does not cover the technical implementation of BDC.
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
In this webinar, we cover how ScaleBase provides transparent data distribution to its clients, overcoming caveats, hiding the complexity involved in data distribution, and making it transparent to the application.
Envision IT Seminar Presentation - Microsoft Office 365 Envision IT
On May 5 2011, Envision IT, Leaders in SharePoint Solutions, presented an introductory seminar on BPOS and Microsoft Office 365 at Microsoft Canada headquarters. Visit our website at www.envisionit.com for more details.
Operational BI caters to decision making challenges faced by organizations today by enabling exchange of information between decision makers and alerting the right people at the right time, transcending the organizational boundaries.
Database Architechs is a database-focused consulting company for 17 years bringing you the most skilled and experienced data and database experts with a wide variety of service offering covering all database and data related aspects.
Talend Open Studio and Hortonworks Data PlatformHortonworks
Data Integration is a key step in a Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. OK, I have a cluster…now what? Complex scripts? For wide scale adoption of Apache Hadoop, an intuitive set of tools that abstract away the complexity of integration is necessary.
Introduccion a SQL Server Master Data ServicesEduardo Castro
En esta presentación hacemos una introducción a SQL Server 2008 R2 Master Data Services.
Saludos,
Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica
Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server
http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/
This session covers a brief introduction about Fusion Applications and the session progresses into the discussion of some of the highlights of the Fusion MDM for Customer application.
To Each Their Own: How to Solve Analytic ComplexityInside Analysis
The Briefing Room with Shawn Rogers and Noetix
Slides from the Live Webcast on Aug. 14, 2012
One size will never fit all in the complex world of information management. In fact, the variety of information systems in use continues to expand. That includes all kinds of systems: data-producing applications, data-processing apps, and the downstream tools used for reporting and analytics. How can data-savvy organizations stay ahead of the curve?
Check out this episode of The Briefing Room to learn from Analyst Shawn Rogers of Enterprise Management Associates, who will explain how effective use of standard data models can solve the complexity of increasingly heterogeneous information architectures. Rogers will be briefed by Daryl Orts of Noetix who will tout his company’s wide range of industry and application-specific data models which can be used to satisfy the particular needs of today’s diverse user community.
For more information, visit: http://www.insideanalysis.com
The Business Data Catalogue provides a method of integrating business data from back-end server applications, such as SAP or Siebel or other line of business applications, into Microsoft Office SharePoint Server 2007, without writing any code. Business Intelligence with Office SharePoint Server 2007 provides a framework for accessing that data, and for providing a toolset for business decision makers to turn the raw data into critical business information.
This presentation is designed to provide the audience with an overview of how BI can be used to create visual dashboards that assemble and display business information from multiple sources (e.g. Excel Services, SQL Reporting) using built-in web parts.
This is a business session and does not cover the technical implementation of BDC.
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
In this webinar, we cover how ScaleBase provides transparent data distribution to its clients, overcoming caveats, hiding the complexity involved in data distribution, and making it transparent to the application.
Envision IT Seminar Presentation - Microsoft Office 365 Envision IT
On May 5 2011, Envision IT, Leaders in SharePoint Solutions, presented an introductory seminar on BPOS and Microsoft Office 365 at Microsoft Canada headquarters. Visit our website at www.envisionit.com for more details.
Operational BI caters to decision making challenges faced by organizations today by enabling exchange of information between decision makers and alerting the right people at the right time, transcending the organizational boundaries.
Database Architechs is a database-focused consulting company for 17 years bringing you the most skilled and experienced data and database experts with a wide variety of service offering covering all database and data related aspects.
Talend Open Studio and Hortonworks Data PlatformHortonworks
Data Integration is a key step in a Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. OK, I have a cluster…now what? Complex scripts? For wide scale adoption of Apache Hadoop, an intuitive set of tools that abstract away the complexity of integration is necessary.
The Next Generation of Big Data AnalyticsHortonworks
Apache Hadoop has evolved rapidly to become a leading platform for managing and processing big data. If your organization is examining how you can use Hadoop to store, transform, and refine large volumes of multi-structured data, please join us for this session where we will discuss, the emergence of "big data" and opportunities for deriving business value, the evolution of Apache Hadoop and future directions, essential components required in a Hadoop-powered platform, and solution architectures that integrate Hadoop with existing data discovery and data warehouse platforms.
Introduction to Hortonworks Data Platform for WindowsHortonworks
According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations to take advantage of this emerging Big Data technology. Join us in this informative webinar to hear about the new Hortonworks Data Platform for Windows.
In less than an hour, you’ll learn:
-Key capabilities available in Hortonworks Data Platform for Windows
-How HDP for Windows integrates with Microsoft tools
-Key workloads and use cases for driving Hadoop today
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
Big Data is everywhere. And at the center of the big data discussion is Apache Hadoop, a next-generation enterprise data platform that allows you to capture, process and share the enormous amounts of new, multi-structured data that doesn’t fit into transitional systems.
With Microsoft HDInsight, powered by Hortonworks Data Platform, you can bridge this new world of unstructured content with the structured data we manage today. Together, we bring Hadoop to the masses as an addition to your current enterprise data architectures so that you can amass net new insight without net new headache.
Break Through the Traditional Advertisement Services with Big Data and Apache...Hortonworks
Entravision Communications Corporation (NYSE: EVC) is a diversified Spanish-language media company with a unique group of media assets including television stations, radio stations and digital platforms. In 2011, they made the strategic decision to build a data analytics, modeling and insights division to expand the value of its traditional advertisement services. Join us in this session with Franklin Rios, President of Luminar (an Entravision company), Oscar Padilla, VP of Strategy, Luminar, along with Impetus and Hortonworks as we discusses key implementations, results and lessons learned from their big data services operations.
The Briefing Room with Mark Madsen and Hortonworks
Slides from the Live Webcast on Oct. 16, 2012
The power of Hadoop cannot be denied, as evidenced by the fact that all the biggest closed-source vendors in the world of data management have embraced this open-source project with virtually open arms. But Hadoop is not a data warehouse, nor ever will it likely be. Rather, it's ideal role for now is to augment traditional data warehousing and business intelligence. As an adjunct, Hadoop provides an amazing mechanism for storing and analyzing Big Data. The key is to manage expectations and move forward carefully.
Check out this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature, who will explain how, where, when and why to leverage the open-source elephant in the enterprise. He'll be briefed by Jim Walker of Hortonworks who will tout his company's vision for the future of Big Data management. He'll provide details on their data platform and how it can be used to complete the picture of information management. He'll also discuss how the Hortonworks partner network can help companies get big value from Big Data.
Visit: http://www.insideanalysis.com
“Apache Hadoop, Now and Beyond”, Jim Walker, Director of Product Marketing, Hortonworks
Hadoop is an open source project that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. It is shifting the way many traditional organizations think of analytics and business models. While it is deigned to take advantage of cheap commodity hardware, it is also perfect for the cloud as it is built to scale up or down without system interruption. In this presentation, Jim Walker will provide an overview of Apache Hadoop and its current state of adoption in and out of the cloud.
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
The Briefing Room with Richard Hackathorn and Teradata
Slides from the Live Webcast on May 29, 2012
The worlds of Business Intelligence (BI) and Big Data Analytics can seem at odds, but only because we have yet to fully experience comprehensive approach to managing big data – a Unified Big Data Architecture. The dynamics continue to change as vendors begin to emphasize the importance of leveraging SQL, engineering and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing.
Register for this episode of The Briefing Room to learn the value of taking a strategic approach for managing big data from veteran BI and data warehouse consultant Richard Hackathorn. He'll be briefed by Chris Twogood of Teradata, who will outline his company's recent advances in bridging the gap between Hadoop and SQL to unlock deeper insights and explain the role of Teradata Aster and SQL-MapReduce as a Discovery Platform for Hadoop environments.
For more information visit: http://www.insideanalysis.com
Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E
Trending use cases have pointed out the complementary nature of Hadoop and existing data management systems—emphasizing the importance of leveraging SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing. Many vendors have provided interfaces between SQL systems and Hadoop but have not been able to semantically integrate these technologies while Hive, Pig and SQL processing islands proliferate. This session will discuss how Teradata is working with Hortonworks to optimize the use of Hadoop within the Teradata Analytical Ecosystem to ingest, store, and refine new data types, as well as exciting new developments to bridge the gap between Hadoop and SQL to unlock deeper insights from data in Hadoop. The use of Teradata Aster as a tightly integrated SQL-MapReduce® Discovery Platform for Hadoop environments will also be discussed.
Karya develops mobile application services that fits the unique needs of your business. Our Mobile Application Services helps the users to better utilize the power of Mobile Technology.
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
In this one-hour webinar, Caserta Concepts and Talend described an approach to achieve an architectural framework and roadmap to extend a traditional enterprise data warehouse environment, into a Big Data ecosystem.
They illustrated the architectural components involved for collecting, analyzing and delivering Big Data, with a focus on the importance of Hadoop, Data Integration, Machine Learning, NoSQL, Business Intelligence and Analytics.
Attendees learned:
Which Big Data technologies can’t be ignored
Considerations when extending the data ecosystem
What happens to your existing investment
What are the points of integration
Does Big Data = better data?
To find access the recorded webinar or to learn more, visit http://www.casertaconcepts.com/.
FinOps Data - FR - par Matthieu Rousseau & Ismael Goulani
Matthieu Rousseau, CEO & Data Engineer Modeo.
Ismael Goulani, CTO & Data Engineer Modeo.
Retour sur le premier prix dans la catégorie "Solution Innovante" du challenge #LaNuitdelaData avec leur solution Stach, plateforme qui aide les équipes Data à mieux comprendre l'utilisation des données par les "consumers", son coût, et son impact carbone.
Dremio, une architecture simple et performance pour votre data lakehouse.
Dans le monde de la donnée, Dremio, est inclassable ! C’est à la fois une plateforme de diffusion des données, un moteur SQL puissant basé sur Apache Arrow, Apache Calcite, Apache Parquet, un catalogue de données actif et aussi un Data Lakehouse ouvert ! Après avoir fait connaissance avec cette plateforme, il s’agira de préciser comment Dremio aide les organisations à relever les défis qui sont les leurs en matière de gestion et gouvernance des données facilitant l’exécution de leurs analyses dans le cloud (et/ou sur site) sans le coût, la complexité et le verrouillage des entrepôts de données.
Tomer Shiran est le fondateur et chef de produit (CPO) de Dremio. Tomer était le 4e employé et vice-président produit de MapR, un pionnier de l'analyse du Big Data. Il a également occupé de nombreux postes de gestion de produits et d'ingénierie chez IBM Research et Microsoft, et a fondé plusieurs sites Web qui ont servi des millions d'utilisateurs. Il est titulaire d'un Master en génie informatique de l'Université Carnegie Mellon et d'un Bachelor of Science en informatique du Technion - Israel Institute of Technology.
Le Modern Data Stack meetup est ravi d'accueillir Tomer Shiran. Depuis Apache Drill, Apache Arrow maintenant Apache Iceberg, il ancre avec ses équipes des choix pour Dremio avec une vision de la plateforme de données “ouverte” basée sur des technologies open source. En plus, de ces valeurs qui évitent le verrouillage de clients dans des formats propriétaires, il a aussi le souci des coûts qu’engendrent de telles plateformes. Il sait aussi proposer un certain nombre de fonctionnalités qui transforment la gestion de données grâce à des initiatives telles Nessie qui ouvre la route du Data As Code et du transactionnel multi-processus.
Le Modern Data Stack Meetup laisse “carte blanche” à Tomer Shiran afin qu’il nous partage son expérience et sa vision quant à l’Open Data Lakehouse.
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
Construire le cluster le plus rapide pour l'analyse des datas : benchmarks sur un régresseur par Christopher Bourez (Axa Global Direct)
Les toutes dernières technologies de calcul parallèle permettent de calculer des modèles de prédiction sur des big datas en des temps records. Avec le cloud est facilité l'accès à des configurations hardware modernes avec la possibilité d'une scalabilité éphémère durant les calculs. Des benchmarks sont réalisés sur plusieurs configuration hardware, allant de 1 instance à un cluster de 100 instances.
Christopher Bourez, développeur & manager expert en systèmes d'information modernes chez Axa Global Direct. Alien thinker. Blog : http://christopher5106.github.io/
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
Migration de données structurées entre Hadoop et RDBMS par Louis Rabiet (Squid Solution)
Avec l'extraction de données stockées dans une base de données relationnelle à l'aide d'un outil de BI avancé, et avec l'envoi via Kafka des données vers Tachyon, plusieurs sessions Spark peuvent travailler sur le même dataset en limitant la duplication. On obtient grâce à cela une communication à coût contrôlé entre la base de données d'origine et Spark ce qui permet de réintroduire de manière dynamique les données modifiées avec MLlib tout en travaillant sur des données à jour. Les résultats préliminaires seront partagés durant cette présentation.
Système de recommandations de produits sur un site marchand par Koby KARP, Data Scientist (Equancy) & Hervé MIGNOT, Partner at Equancy
La recommandation reste un outil clé pour la personnalisation des sites marchands et le sujet est loin d’être épuisé. La prise en compte de la particularité d’un marché peut nécessité d’adapter le traitement et les algorithmes utilisés. Après une revue des techniques de recommandations, nous présenterons la démarche spécifique que nous avons adopté. Le système a été développé sous Spark pour la préparation des données et le calcul des modèles de recommandations. Une API simple et son service ont été développé pour délivrer les recommandations aux applications clientes.
L'approche Model as Code par Benoit Grossin (EDF-R&D) et Matthieu Vautrot (Quantmetry)
La mise en production de modèles est une étape charnière du cycle de vie d’un projet Data Science mené au sein d’une entreprise.
On observe que cette partie est encore rarement industrialisée alors qu’elle est indispensable pour l’exploitation continue des résultats des modèles.
Lorsque qu’un modèle finalisé présente un pouvoir prédictif satisfaisant en phase de développement, l'industrialisation de sa mise en production permet de le déployer et de l’exploiter de manière continue et automatique et ce, en minimisant la charge de travail.
Notre intervention présentera notre retour d'expérience dans le contexte EDF sur la mise en place d'une approche capable de raccourcir voire d'annuler le temps de mise en production dans un environnement Hadoop et plus particulièrement Hive.
Benoit Grossin est Ingénieur de Recherche chez EDF-R&D ICAM
Matthieu Vautrot est Consultant Analytics & Big Data chez Quantmetry
Industrialisation des processus Big Data chez CANAL+ par Pascal PERISSEAU et Stephen CLAIRVILLE (CanalPlus)
L'intégration de la brique technique Big Data au sein d'une architecture décisionnelle déjà existante. Retour d’expérience sur les développements réalisés afin de faciliter l’intégration, la supervision, et l’exploitation des flux Hadoop dans notre écosystème décisionnel / présentation de la phase préparatoire de la mise à disposition des données aux data analysts et data scientists.
Pascal PERISSEAU, responsable technique du pôle décisionnel et Big Data chez CANAL+ depuis 10 ans
Stephen CLAIRVILLE, chef de projet tech. lead Big Data depuis 2 ans chez CANAL+
Presentation faite lors du Hadoop User Group France du 14 janvier 2016.
L’analytique temps réel avec Riak et Spark par Michael Carney (Basho) et Olivier Girardot de Lateral Thoughts
Selon un rapport de Salesforce, le nombre de sources de données analysées par les entreprises progressera de 83% au cours des cinq prochaines années, ainsi les organisations veulent désormais fournir des connaissances en temps réel même sur les appareils mobiles. Le traitement temps réel est donc, le futur de l’analyse big data.
Ce talk présentera des nouveautés en matière de l’analyse temps réel autour de la famille SGBD Riak et Spark.
Michael Carney est le Directeur Commercial de Basho pour le Sud d’Europe. Fondateur de MySQL France et de MariaDB, Michael a rejoint Basho en janvier 2015 pour explorer le monde de données sans tables !
Olivier Girardot est le CTO de Lateral Thoughts, il est développeur et formateur au sujet de Spark et également spécialiste de Java/Python dans le domaine de la finance de marché.
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)Modern Data Stack France
HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
Le traitement et l’analyse de grand volume de données sont au cœur des activités des banques. Bon nombre d’acteurs des marchés financiers ont déjà adopté Hadoop sur de nombreux cas d’usage : gestion des risques, identification des opportunités commerciales, détection de fraude, surveillance des marchés…
Une incroyable diversité de format doit être gérée. De ce point de vue, HBase est un choix naturel de base de données distribuée grâce à son modèle de donnée dynamique.
Après une présentation générale des caractéristiques d’HBase, ce talk présente comment modéliser les informations traitées pour s’adapter à différents contextes d’utilisation.
Pierre Bittner est le CTO de Scaled Risk, éditeur d’une plateforme Big Data dédiée aux institutions financières. Scaled Risk est bâtie sur HBase. Pierre intervient depuis 10 ans sur les SI bancaires.
Démarrer rapidement avec Apache Flink par Bilal Baltagi
- Présentation de l'éco Système Apache Flink
- Prise en main rapide
Bilal Baltagi a obtenu un master en analyse des données à l'Université Paris Nord - Paris 13. Il est actuellement consultant décisionnel chez Sarenza à Paris. Il intervient sur toutes les phases d'un projet décisionnel et Big data: recueil des besoins, conceptions, réalisations et accompagnement des utilisateurs. Bilal est de plus en plus intéressé à l'intersection de la Big Data avec la Business Intelligence et aime jouer avec Apache Flink!
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy
Retour d'expérience sur la mise en place d'un Datalab avec Hadoop, Spark et ElasticSearch dans un environnement contraint. Nous allons exposer les méthodes qui nous ont permis d'améliorer la conception, le développement, les performances et la recette d'une application complexe en Spark.
Jonathan Winandy est MOE, développeur Java/Scala spécialisé dans les pipelines de données.
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
Record Linkage, un cas d’utilisation en Spark ML par Alexis Seigneurin
Le Record Linkage est le process qui consiste à trouver, dans un data set, les enregistrements qui représentent la même entité. Cette opération est particulièrement compliquée quand, comme nous, vous travaillez avec des données anonymisées. C’est là que le Machine Learning vient en renfort ! Nous avons implémenté un algorithme de Record Linkage en Spark SQL (DataFrames) et Spark ML plutôt que d’utiliser des règles statiques. Nous verrons le process de Feature Engineering, pourquoi nous avons dû étendre Spark DataFrames pour préserver des méta-données au travers du pipeline de traitement, et comment nous avons utilisé le Machine Learning pour réconcilier les enregistrements. Nous verrons enfin comment nous avons industrialisé cette application.
Alexis Seigneurin : Développeur depuis 15 ans, j'attache beaucoup d'importance aux problématiques de traitement, d'analyse et de stockage de la donnée.Chez Ippon, j'interviens principalement sur des missions de conseil et d'architecture autour de technologies big data. Par ailleurs, j'anime la formation Spark chez Ippon.
Spark meetup www.meetup.com/Paris-Spark-Meetup/events/222607538/
La dernière version de Spark nous apporte une nouvelle API inspirée des librairies et langage d'analyse statistique. Nous verrons comment Spark Dataframe nous permet de simplement manipuler et explorer les données en conservant la scalabilité de Spark RDD
Recherche full-text et recommandation, deux mondes à part? Nous verrons qu’il est possible de marier Lucene (Elastic Search/Solr) et filtrage collaboratif afin de produire un système de recommandation flexible et scalable. Cela passera par un aperçu des dernières sorties : la plateforme Confluent (Kafka) ainsi que Mahout 0.10 (avec Samsara).
Matthieu Blanc présentera spark.ml. En effet, la version 1.2 de Spark a introduit ce nouveau package qui fournit une API de haut niveau permettant la création de pipeline de machine learning. Nous verrons ensemble les concepts de base de cet API à travers un exemple.
http://hugfrance.fr/spark-meetup-a-la-sg-avec-cloudera-xebia-et-influans-le-jeudi-11-juin/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Life used to be simple and very transactional in natureEarly 90’s, ERP: transactions count your sales by customer by locationLate 90’s – the age of segmentation and targeted offers. Merge customer operations with marketingNow, life is more complex, connected, and interactional in nature!Digital marketing enables measurement of interactions across channelsSocial networks, mobile commerce, and user-generated content increases the TYPES and VOLUMES of data which is generated by system:system communication and data exhaust from customer behavior like click-streamAnd big data is just beginning – we don’t even list all the sensors, telematics, and other machine-generated data which is predicted to eclipse even that which is generated by social networks
Facts that bolster this vision include: 80% - 90% of the world’s data is unstructured or semi-structured (Forrester, IDC, Gartner all agree) Data volumes have increased exponentially over the past decade and are continuing to do so (IDC, McKinsey reports) Hadoop is uniquely designed to store and process this type of data…at scale…across commodity systems The major server and storage platform vendors are all creating Hadoop-focused strategies
Apache Hadoop LeadershipSanjay RadiaHDFS Core Lead Architect, 4+ years on Hadoop. Major projects include Append v2, Capacity scheduler, Federation and HA.Owen O’MalleyThe Leading committer of code to Hadoop. 5+ years on Hadoop. Original Hadoop architect at Yahoo!. Drove the implementation of Security throughout the project. Arun MurthyOriginal MapReduce Lead. 5+ years on Hadoop. Currently lead architect and Release manager of Apache Hadoop .23.Matt FoleyRelease manager of Apache Hadoop .20.205. Former Director of Engineering for Yahoo! Mail, now running Hortonworks’ Quality and Release efforts.Deveraj DasBuilt the original MapReduce development team at Yahoo!. 5+ years on Hadoop. Now leading up the Apache Ambari (Hadoop Management) project.Alan GatesLead of Pig and HCatalog. 3+ years on Hadoop.
Infrastructure Platform (Servers, Storage, Network, Operating System, Virtualization, Cloud)Systems Management (Installation, Configuration, Administration, Monitoring, Performance, Security Mgmt, Capacity Mgmt, Quality of Service)Data Management Systems (SQL, NoSQL, NewSQL, EDW, Datamarts, MPP DBs, Search, Indexing, MDM, etc.)Data Movement & Integration (ETL, Data Quality, Integration Middleware, Event Processing)Tools & Languages (IDEs, Programming Languages, other tools)Business Intelligence & Analytics (Analytics, Reporting, Visualization, and Dashboards)Applications & Solutions (SaaS offerings, bundled solutions, etc.)
Infrastructure Platform (Servers, Storage, Network, Operating System, Virtualization, Cloud)Systems Management (Installation, Configuration, Administration, Monitoring, Performance, Security Mgmt, Capacity Mgmt, Quality of Service)Data Management Systems (SQL, NoSQL, NewSQL, EDW, Datamarts, MPP DBs, Search, Indexing, MDM, etc.)Data Movement & Integration (ETL, Data Quality, Integration Middleware, Event Processing)Tools & Languages (IDEs, Programming Languages, other tools)Business Intelligence & Analytics (Analytics, Reporting, Visualization, and Dashboards)Applications & Solutions (SaaS offerings, bundled solutions, etc.)
In the graphic above, Apache Hadoop acts as the Big Data Refinery. It’s great at storing, aggregating, and transforming multi-structured data into more useful and valuable formats.Apache Hive is a Hadoop-related component that fits within the Business Intelligence & Analytics category since it is commonly used for querying and analyzing data within Hadoop in a SQL-like manner. Apache Hadoop can also be integrated with other EDW, MPP, and NewSQL components such as Teradata, Aster Data, HP Vertica, IBM Netezza, EMC Greenplum, SAP Hana, Microsoft SQL Server PDW and many others.Apache HBase is a Hadoop-related NoSQL Key/Value store that is commonly used for building highly responsive next-generation applications. Apache Hadoop can also be integrated with other SQL, NoSQL, and NewSQL technologies such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM DB2, MongoDB, DynamoDB, MarkLogic, Riak, Redis, Neo4J, Terracotta, GemFire, SQLFire, VoltDB and many others.Finally, data movement and integration technologies help ensure data flows seamlessly between the systems in the above diagrams; the lines in the graphic are powered by technologies such as WebHDFS, Apache HCatalog, Apache Sqoop, Talend Open Studio for Big Data, Informatica, Pentaho, SnapLogic, Splunk, Attunity and many others.
At the highest level, I describe three broad areas of data processing and outline how these areas interconnect.The three areas are:1.Business Transactions & Interactions2. Business Intelligence & Analytics3. Big Data RefineryThe graphic illustrates a vision for how these three types of systems can interconnect in ways aimed at deriving maximum value from all forms of data.Enterprise IT has been connecting systems via classic ETL processing, as illustrated in Step 1 above, for many years in order to deliver structured and repeatable analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions.The “Big Data Refinery”, as highlighted in Step 2, is a new system capable of storing, aggregating, and transforming a wide range of multi-structured raw data sources into usable formats that help fuel new insights for the business. The Big Data Refinery provides a cost-effective platform for unlocking the potential value within data and discovering the business questions worth answering with this data. A popular example of big data refining is processing Web logs, clickstreams, social interactions, social feeds, and other user generated data sources into more accurate assessments of customer churn or more effective creation of personalized offers.More interestingly, there are businesses deriving value from processing large video, audio, and image files. Retail stores, for example, are leveraging in-store video feeds to help them better understand how customers navigate the aisles as they find and purchase products. Retailers that provide optimized shopping paths and intelligent product placement within their stores are able to drive more revenue for the business. In this case, while the video files may be big in size, the refined output of the analysis is typically small in size but potentially big in value.The Big Data Refinery platform provides fertile ground for new types of tools and data processing workloads to emerge in support of rich multi-level data refinement solutions.With that as backdrop, Step 3 takes the model further by showing how the Big Data Refinery interacts with the systems powering Business Transactions & Interactions and Business Intelligence & Analytics. Interacting in this way opens up the ability for businesses to get a richer and more informed 360 ̊ view of customers, for example.By directly integrating the Big Data Refinery with existing Business Intelligence & Analytics solutions that contain much of the transactional information for the business, companies can enhance their ability to more accurately understand the customer behaviors that lead to the transactions.Moreover, systems focused on Business Transactions & Interactions can also benefit from connecting with the Big Data Refinery. Complex analytics and calculations of key parameters can be performed in the refinery and flow downstream to fuel runtime models powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.Since the Big Data Refinery is great at retaining large volumes of data for long periods of time, the model is completed with the feedback loops illustrated in Steps 4 and 5. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.
“Node" means a Server or Virtual Machine capable of running the Software. “Server” means a single hardware system capable of running the Software. A hardware partition or blade is considered a separate hardware system.“Virtual Machine" means a software container that can run its own operating system and execute applications like a physical machine.“Cluster” means two or more Nodes that are interconnected for the purposes of executing application programs and sharing data.“Storage” means the total available storage space, also known as raw capacity, within the cluster
I want to be careful with how we present services….they do want people to come onsite for extended engagements