Treasure Data provides a big data analytics platform that runs on Hadoop in the cloud. It aims to simplify big data and make it accessible for more users ("Big Data for the Rest of Us"). Treasure Data collects and stores data from various sources in its cloud-based columnar datastore and allows querying and analysis of data through SQL, REST APIs and other tools. It handles all the operational complexities of Hadoop and provides a simple interface for users.
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
One benefit of Apache Hadoop is the ability to power multiple workloads, across many different users and departments, all within a single, shared cluster. Hear how BT is doing this today and learn about new features in Cloudera Manager to provide better visibility for multi-tenant operations.
Take a peak inside the minds of Salesforce admins, their habits, and their data. We discovered the biggest struggles admin face, the most critical issues, and best practices.
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
One benefit of Apache Hadoop is the ability to power multiple workloads, across many different users and departments, all within a single, shared cluster. Hear how BT is doing this today and learn about new features in Cloudera Manager to provide better visibility for multi-tenant operations.
Take a peak inside the minds of Salesforce admins, their habits, and their data. We discovered the biggest struggles admin face, the most critical issues, and best practices.
Denodo Data Virtualization Platform architecture: Data Discovery and Data Gov...Denodo
Being able to discover and manage your data are essential features of a modern Data Virtualization platform. The Denodo Platform offers a wealth a capabilities for discovering and managing data and metadata and provides advanced data governance functionality, such as data lineage and change impact analysis. This webinar will examine these capabilities.
More information and FREE registrations to this webinar: http://goo.gl/8U4ynC
To learn more click to this link: http://go.denodo.com/a2a
Join the conversation at #Architect2Architect
Agenda:
Introspection and Metadata Management
Global Search
Data Governance
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
Data architecture is foundational to an information-based operational environment. It is your data architecture that organizes your data assets so they can be leveraged in your business strategy to create real business value. Even though this is important, not all data architectures are used effectively. This webinar describes the use of data architecture as a basic analysis method. Various uses of data architecture to inform, clarify, understand, and resolve aspects of a variety of business problems will be demonstrated. As opposed to showing how to architect data, your presenter Dr. Peter Aiken will show how to use data architecting to solve business problems. The goal is for you to be able to envision a number of uses for data architectures that will raise the perceived utility of this analysis method in the eyes of the business.
Takeaways:
Understanding how to contribute to organizational challenges beyond traditional data architecting
How to utilize data architectures in support of business strategy
Understanding foundational data architecture concepts based on the DAMA DMBOK
Data architecture guiding principles & best practices
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
In this talk we share the lessons learned while building out the Baqend Cloud platform on AWS and Docker. Baqend’s AWS-hosted architecture consists of a caching CDN-Layer, global and local load balancing, a group of REST and Node.js servers and a database cluster with Redis and MongoDB. As customers have their own set of containerized REST and Node servers, we needed a cluster that on the one hand is horizontally scalable and on the other hand easily manageable and fault-tolerant from an operational perspective. Today there are at least 4 popular systems that claim to support this:
- Kubernetes
- Apache Mesos
- Docker Swarm
- AWS Elastic Container Service (ECS)
Thinking that ECS would certainly be the easiest option on AWS, we started building our cluster on it. We quickly came to realize that while ECS was astoundingly stable and easy to use there were inherent limitations that could not be worked around. An old Docker version, missing network isolation, no means of parameterizing task and forced memory constraints are major limitations of ECS we will talk about. Seeing the daunting operational overhead of running Kubernetes or Mesos in practice we turned to Docker’s native clustering solution Swarm. We will present how Swarm works with both Docker and AWS and highlight the advantages and downsides compared to Amazon’s ECS.
At the StampedeCon 2015 Big Data Conference: YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the features of the Capacity Scheduler that enable Multi-Tenancy and how resource sharing can be rebalanced using features like Preemption.
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStreamgogo6
Download our special report, IoT Tech for the Manager: http://bit.ly/report1-slideshare
IoT Meets Big Data: The Opportunities and Challenges as presented at the IoT Inc Business' Eighth Meetup. See: http://www.iot-inc.com/iot-meets-big-data-the-opportunities-and-challenges/
In our eighth Meetup we have Syed Hoda, Chief Marketing Officer of ParStream presenting “IoT Meets Big Data: The Opportunities and Challenges”. Come meet other business leaders in the IoT ecosystem and discuss the business issues you face in the Internet of Things.
Presentation Abstract
The Internet of Things (IoT) and Big Data have each made press headlines and continue to be board-level priorities. The intersection of IoT and Big Data is a fascinating area of innovation with tremendous scope for business impact. From industrial sensors to vehicles to health monitors, a huge variety of devices connects to the Internet and share information. At the same time, the cost to store data has dropped dramatically while capabilities for analysis have made huge leaps forward. How can analytics drive business benefits from IoT projects? What are the challenges in storing and analyzing huge amounts of real-world information? How can companies generate more value from their data? We will address these questions and also share our perspectives on innovative technologies enabling new IoT use cases.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
Effective data governance is imperative to the success of Data Lake initiatives. Without governance policies and processes, information discovery and analysis is severely impaired. In this session we will provide an in-depth look into the Data Governance Initiative launched collaboratively between Hortonworks and partners from across industries. We will cover the objectives of Data Governance Initiatives and demonstrate key governance capabilities of the Hortonworks Data Platform.
This presentation elaborates on design decisions and design options when it comes to designing the master data architecture.
The presentation was given at the 16th Americas Conference on Information Systems (AMCIS 2010) in Lima, Peru.
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
Hadoop provides a powerful platform for data science and analytics, where data engineers and data scientists can leverage myriad data from external and internal data sources to uncover new insight. Such power is also presenting a few new challenges. On the one hand, the business wants more and more self-service, and on the other hand IT is trying to keep up with the demand for data, while maintaining architecture and data governance standards.
In this webinar, Andrew Ahn, Data Governance Initiative Product Manager at Hortonworks, will address the gaps and offer best practices in providing end-to-end data governance in HDP. Andrew Ahn will be followed by Oliver Claude of Waterline Data, who will share a case study of how Waterline Data Inventory works with HDP in the Modern Data Architecture to automate the discovery of business and compliance metadata, data lineage, as well as data quality metrics.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture
Denodo Data Virtualization Platform architecture: Data Discovery and Data Gov...Denodo
Being able to discover and manage your data are essential features of a modern Data Virtualization platform. The Denodo Platform offers a wealth a capabilities for discovering and managing data and metadata and provides advanced data governance functionality, such as data lineage and change impact analysis. This webinar will examine these capabilities.
More information and FREE registrations to this webinar: http://goo.gl/8U4ynC
To learn more click to this link: http://go.denodo.com/a2a
Join the conversation at #Architect2Architect
Agenda:
Introspection and Metadata Management
Global Search
Data Governance
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
Data architecture is foundational to an information-based operational environment. It is your data architecture that organizes your data assets so they can be leveraged in your business strategy to create real business value. Even though this is important, not all data architectures are used effectively. This webinar describes the use of data architecture as a basic analysis method. Various uses of data architecture to inform, clarify, understand, and resolve aspects of a variety of business problems will be demonstrated. As opposed to showing how to architect data, your presenter Dr. Peter Aiken will show how to use data architecting to solve business problems. The goal is for you to be able to envision a number of uses for data architectures that will raise the perceived utility of this analysis method in the eyes of the business.
Takeaways:
Understanding how to contribute to organizational challenges beyond traditional data architecting
How to utilize data architectures in support of business strategy
Understanding foundational data architecture concepts based on the DAMA DMBOK
Data architecture guiding principles & best practices
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
In this talk we share the lessons learned while building out the Baqend Cloud platform on AWS and Docker. Baqend’s AWS-hosted architecture consists of a caching CDN-Layer, global and local load balancing, a group of REST and Node.js servers and a database cluster with Redis and MongoDB. As customers have their own set of containerized REST and Node servers, we needed a cluster that on the one hand is horizontally scalable and on the other hand easily manageable and fault-tolerant from an operational perspective. Today there are at least 4 popular systems that claim to support this:
- Kubernetes
- Apache Mesos
- Docker Swarm
- AWS Elastic Container Service (ECS)
Thinking that ECS would certainly be the easiest option on AWS, we started building our cluster on it. We quickly came to realize that while ECS was astoundingly stable and easy to use there were inherent limitations that could not be worked around. An old Docker version, missing network isolation, no means of parameterizing task and forced memory constraints are major limitations of ECS we will talk about. Seeing the daunting operational overhead of running Kubernetes or Mesos in practice we turned to Docker’s native clustering solution Swarm. We will present how Swarm works with both Docker and AWS and highlight the advantages and downsides compared to Amazon’s ECS.
At the StampedeCon 2015 Big Data Conference: YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the features of the Capacity Scheduler that enable Multi-Tenancy and how resource sharing can be rebalanced using features like Preemption.
IoT Meets Big Data: The Opportunities and Challenges by Syed Hoda of ParStreamgogo6
Download our special report, IoT Tech for the Manager: http://bit.ly/report1-slideshare
IoT Meets Big Data: The Opportunities and Challenges as presented at the IoT Inc Business' Eighth Meetup. See: http://www.iot-inc.com/iot-meets-big-data-the-opportunities-and-challenges/
In our eighth Meetup we have Syed Hoda, Chief Marketing Officer of ParStream presenting “IoT Meets Big Data: The Opportunities and Challenges”. Come meet other business leaders in the IoT ecosystem and discuss the business issues you face in the Internet of Things.
Presentation Abstract
The Internet of Things (IoT) and Big Data have each made press headlines and continue to be board-level priorities. The intersection of IoT and Big Data is a fascinating area of innovation with tremendous scope for business impact. From industrial sensors to vehicles to health monitors, a huge variety of devices connects to the Internet and share information. At the same time, the cost to store data has dropped dramatically while capabilities for analysis have made huge leaps forward. How can analytics drive business benefits from IoT projects? What are the challenges in storing and analyzing huge amounts of real-world information? How can companies generate more value from their data? We will address these questions and also share our perspectives on innovative technologies enabling new IoT use cases.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
Effective data governance is imperative to the success of Data Lake initiatives. Without governance policies and processes, information discovery and analysis is severely impaired. In this session we will provide an in-depth look into the Data Governance Initiative launched collaboratively between Hortonworks and partners from across industries. We will cover the objectives of Data Governance Initiatives and demonstrate key governance capabilities of the Hortonworks Data Platform.
This presentation elaborates on design decisions and design options when it comes to designing the master data architecture.
The presentation was given at the 16th Americas Conference on Information Systems (AMCIS 2010) in Lima, Peru.
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
Hadoop provides a powerful platform for data science and analytics, where data engineers and data scientists can leverage myriad data from external and internal data sources to uncover new insight. Such power is also presenting a few new challenges. On the one hand, the business wants more and more self-service, and on the other hand IT is trying to keep up with the demand for data, while maintaining architecture and data governance standards.
In this webinar, Andrew Ahn, Data Governance Initiative Product Manager at Hortonworks, will address the gaps and offer best practices in providing end-to-end data governance in HDP. Andrew Ahn will be followed by Oliver Claude of Waterline Data, who will share a case study of how Waterline Data Inventory works with HDP in the Modern Data Architecture to automate the discovery of business and compliance metadata, data lineage, as well as data quality metrics.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture
Tech talk on what Azure Databricks is, why you should learn it and how to get started. We'll use PySpark and talk about some real live examples from the trenches, including the pitfalls of leaving your clusters running accidentally and receiving a huge bill ;)
After this you will hopefully switch to Spark-as-a-service and get rid of your HDInsight/Hadoop clusters.
This is part 1 of an 8 part Data Science for Dummies series:
Databricks for dummies
Titanic survival prediction with Databricks + Python + Spark ML
Titanic with Azure Machine Learning Studio
Titanic with Databricks + Azure Machine Learning Service
Titanic with Databricks + MLS + AutoML
Titanic with Databricks + MLFlow
Titanic with DataRobot
Deployment, DevOps/MLops and Operationalization
Apache Bigtop has created the de-facto standard in how Hadoop-based stacks are developed, delivered, and managed. We are at it again! The track will present the composition of the next generation of in-memory computing stack that is completely built out of open-source components. The next generation of the Apache data processing stack will focus on in-memory and transactional processing of large amounts of data. We will also be talking about performance benefits that legacy data-processing software based on MapReduce, Hive, and similar, can derive from in-memory computing. This session will discuss and analyze the benefits of practicing Fast Data in the open.
CloudFoundry and MongoDb, a marriage made in heavenPatrick Chanezon
This talk will provide an overview of the PaaS (Platform as a Service) landscape, and will describe the Cloud Foundry open source PaaS, with its multi-framework, multi-service, multi-cloud model. Cloud Foundry allows developers to provision apps in Java/Spring, Ruby/Rails, Ruby/Sinatra, Javascript/Node, and leverage services like MySQL, MongoDB, Reddis, Postgres and RabbitMQ. It can be used as a public PaaS on CloudFoundry.com and other service providers (ActiveState, AppFog), to create your own private cloud, or on your laptop using the Micro Cloud Foundry VM. Micro Cloud Foundry is a very easy way for developers to start working on their application using their framework of choice and MongoDB, without the need to setup a development environment, and your app is one command line away (vmc push) from deployment to cloudfoundry.com.
10 concepts the enterprise decision maker needs to understand about HadoopDonald Miner
Way too many enterprise decision makers have clouded and uninformed views of how Hadoop works and what it does. Donald Miner offers high-level observations about Hadoop technologies and explains how Hadoop can shift the paradigms inside of an organization, based on his report Hadoop: What You Need To Know—Hadoop Basics for the Enterprise Decision Maker, forthcoming from O’Reilly Media.
After a basic introduction to Hadoop and the Hadoop ecosystem, Donald outlines 10 basic concepts you need to understand to master Hadoop:
Hadoop masks being a distributed system: what it means for Hadoop to abstract away the details of distributed systems and why that’s a good thing
Hadoop scales out linearly: why Hadoop’s linear scalability is a paradigm shift (but one with a few downsides)
Hadoop runs on commodity hardware: an honest definition of commodity hardware and why this is a good thing for enterprises
Hadoop handles unstructured data: why Hadoop is better for unstructured data than other data systems from a storage and computation perspective
In Hadoop, you load data first and ask questions later: the differences between schema-on-read and schema-on-write and the drawbacks this represents
Hadoop is open source: what it really means for Hadoop to be open source from a practical perspective, not just a “feel good” perspective
HDFS stores the data but has some major limitations: an overview of HDFS (replication, not being able to edit files, and the NameNode)
YARN controls everything going on and is mostly behind the scenes: an overview of YARN and the pitfalls of sharing resources in a distributed environment and the capacity scheduler
MapReduce may be getting a bad rap, but it’s still really important: an overview of MapReduce (what it’s good at and bad at and why, while it isn’t used as much these days, it still plays an important role)
The Hadoop ecosystem is constantly growing and evolving: an overview of current tools such as Spark and Kafka and a glimpse of some things on the horizon
Hadoop, SQL & NoSQL: No Longer an Either-or QuestionTony Baer
It used to be black and white. If you needed MapReduce processing, you chose Hadoop; if you needed standard query and reporting, you chose a SQL data warehouse. The decision is no longer clear cut. With YARN clearing the way for Hadoop to accept multiple workloads, Hadoop is no longer your father’s MapReduce machine – as frameworks are rapidly emerging for interactive SQL, search, streaming and other workloads. We are on the path toward a federated world of analytic and operational decision stores, but as the boundaries between platform types grow fuzzier, deciding what platforms to use and where to run which workloads grow trickier.
Hadoop Vs Spark — Choosing the Right Big Data FrameworkAlaina Carter
The data is increasing, and to digest all this data, there are many distributed systems available. Hadoop and Spark are the most famous ones. Choosing one out of two depends entirely upon the requirement of your project. Read more to know which of these two frameworks is right for you.
The new GDPR regulation went into effect on May 25th. While a majority of conversations have revolved around the security and IT aspects of the law, marketing teams will play a crucial role in helping organizations meet GDPR standards and playing a strategic role across the organization . Join us to learn more, engage with your peers and get prepared.
This webinar will cover:
- How complying with the GDPR will drive better marketing and raise the standard of the quality of your customer engagement
- The GDPR elements marketers must know about
- The elements of PII that will be affected and what marketers need to do about it
- A deep dive on how GDPR regulations will affect your marketing channels - email, programmatic advertising, cold calls, etc.
- Tactical marketing updates needed to meet GDPR guidelines
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
With AR and VR technologies, it’s the first time that data collection has been part of the front-end strategy vs back-end process. As companies compete to create new, interactive experiences, data is the tool of choice to measure all aspects of player engagement and marketing effectiveness. In this webinar, two industry experts, Nicolas Nadeau and Andrew Mayer, will talk about the trends driving AR and VR markets today, and what data-driven approaches companies need to think about to compete in these markets tomorrow.
An overview of Customer Data Platforms (CDP) with the industry leader who coined the term, David Raab. Find out how to use Live Customer Data to create a better customer experience and how Live Data Management can give you a competitive edge with a 360 degree view of your clients.
Learn:
- The definition and requirements for Customer Data Platforms
- The differences between Customer Data Platforms and comparative technologies such as Data Warehousing and Marketing Automation
- Reference architectures/approaches to building CDP
- How Treasure Data is used to build Customer Data Platforms
And here's the song: https://youtu.be/RalMozVq55A
In this hands-on webinar we will cover how to leverage the Treasure Data Javascript SDK library to ensure user stitching of web data into the Treasure Data Customer Data Platform to provide a holistic view of prospects and customers.
We will demo the native SDK, as well as deploying the SDK inside of Adobe DTM and Google Tag Manager.
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
In this hands-on webinar we'll explore the data warehousing concept of Slowly Changing Dimensions (SCDs) and common use cases for managing SCDs when dealing with customer data. This webinar will demonstrate different methods for tracking SCDs in a data warehouse, and how Treasure Data Workflow can be used to create robust data pipelines to handle these processes.
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
Gaming companies with multiple products often struggle to calculate accurate Customer Lifetime Value (CLTV) across their portfolio. This is because user data is often analyzed in silos so companies are unable to get a clear picture of ROI and CLTV across platforms, devices and apps.
In this webinar we’ll look at how you can apply a holistic and complete approach to your CLTV and ROI through the lens of gaming companies, though this technique is applicable for any company who has products spanning platforms.
We’ll also explore:
How the integral power of data in business has shifted over the past 10 years.
Discover the current technologies and processes used to analyze data across different platforms by combining multiple data streams, looking at examples in brand and portfolio-based LTV.
How to process and centralize dozens of varying data streams.
Nicolas Nadeau will speak from his extensive experience and show how leveraging data from multiple product strategies spanning many platforms can be highly beneficial for your company.
Do you know what your top ten 'happy' customers look like? Would you like to find ten more just like them? Come learn how to leverage 1st & 3rd party data to map your customer journey and drive users down a path where every interaction is personalized, fun, & data-driven. No more detractors, power your Customer Experience with data!
In this webinar you will learn:
-When, why, and how to leverage 1st, 2nd, and 3rd party data
-Tips & Tricks for marketers to become more data driven when launching their campaigns
-Why all marketers needs a 360 degree customer view
The reality is virtual, but successful VR games still require cold, hard data. For wildly popular games like Survios’ Raw Data, the first VR-exclusive game to reach #1 on Steam’s Global Top Sellers list, data and analytics are the key to success.
And now online gaming companies have the full-stack analytics infrastructure and tools to measure every aspect of a virtual reality game and its ecosystem in real time. You can keep tabs on lag, which ruins a VR experience, improve gameplay and identify issues before they become showstoppers, and create fully personalized, completely immersive experiences that blow minds and boost adoption, and more. All with the right tools.
Make success a reality: Register now for our latest interactive VB Live event, where we’ll tap top experts in the industry to share insights into turning data into winning VR games.
Attendees will:
* Understand the role of VR in online gaming
* Find out how VR company Survios successfully leverages the Exostatic analytics infrastructure for commercial and gaming success
* Discover how to deploy full-stack analytics infrastructure and tools
Speakers:
Nicolas Nadeau, President, Exostatic
Kiyoto Tamura, VP Marketing, Treasure Data
Ben Solganik, Producer, Survios
Stewart Rogers, Director of Marketing Technology, VentureBeat
Wendy Schuchart, Moderator, VentureBeat
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
As big data has exploded, the ability for companies to easily leverage it has imploded. Organizations are drowning in their own information, unable to see the forest through the trees, while the big players consistently outperform in their ability to deliver a great customer experience, faster, cheaper…As a result, the vast majority of companies are scrambling to catch up and become more agile, data-driven, to use their data more effectively so they can attract and retain their elusive customers...
In this joint deck by 451 Research and Treasure Data, you will learn how to enable your line of business team to own their own data (instead of relying on IT) to be able to:
- deliver a single, persistent view of your customer based on behavior data
- make that data accessible to the right people at the right time
- Increase organizational effectiveness by (finally) breaking down silos with data
- enable powerful marketing tools to enhance the customer experience
How to make your open source project MATTER
Let’s face it: most open source projects die. “For every Rails, Docker and React, there are thousands of projects that never take off. They die in the lonely corners of GitHub, only to be discovered by bots scanning for SSH private keys.
Over the last 5 years, I worked on and off on marketing a piece of infrastructure middleware called Fluentd. We tried many things to ensure that it did not die: From speaking at events, speaking to strangers, giving away stickers, making people install Fluentd on their laptop. Most everything I tried had a small, incremental effect, but there were several initiatives/hacks that raised Fluentd’s awareness to the next level. As I listed up these “ideas that worked”, I noticed the common thread: they all brought Fluentd into a new ecosystem via packaging.”
* 행사 정보 :2016년 10월 14일 MARU180 에서 진행된 '데이터야 놀자' 1day 컨퍼런스 발표 자료
* 발표자 : Dylan Ko (고영혁) Data Scientist / Data Architect at Treasure Data
* 발표 내용
- 데이터사이언티스트 고영혁 소개
- Treasure Data (트레저데이터) 소개
- 데이터로 돈 버는 글로벌 사례 #1
>> MUJI : 전통적 리테일에서 데이터 기반 O2O
- 데이터로 돈 버는 글로벌 사례 #2
>> WISH : 개인화&자동화를 통한 쇼핑 최적화
- 데이터로 돈 버는 글로벌 사례 #3
>> Oisix : 머신러닝으로 이탈고객 예측&방지
- 데이터로 돈 버는 글로벌 사례 #4
>> 워너브로스 : 프로세스 자동화로 시간과 돈 절약
- 데이터로 돈 버는 글로벌 사례 #5
>> Dentsu 등의 애드테크(Adtech) 회사들
- 데이터로 돈을 벌고자 할 때 반드시 체크해야 하는 것
Keynote on Fluentd Meetup Summer
Related Slide
- Fluentd ServerEngine Integration & Windows Support http://www.slideshare.net/RittaNarita/fluentd-meetup-2016-serverengine-integration-windows-support
- Fluentd v0.14 Plugin API Details http://www.slideshare.net/tagomoris/fluentd-v014-plugin-api-details
John Hammink's Talk at Great Wide Open 2016. We discuss: 1.) the need for data analytics infrastructure that can scale exponentially and 2.) what such an infrastructure must contain and finally 3.) the need for an infrastructure to be able to handle un - and semi-structured data.
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
Migrate your semi-structured data from MySQL to Amazon Redshift in as few steps as possible. From Amazon Web Services Bay Area meetup @ Sumo Logic, December 3, 2015.
This presentation describes the common issues when doing application logging and introduce how to solve most of the problems through the implementation of an unified logging layer with Fluentd.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Hadoop meets Cloud with Multi-Tenancy
1. Treasure Data
Hadoop meets Cloud with Multi-Tenancy
Kazuki Ohta
Founder and CTO at Treasure Data, Inc.
Hadoopユーザー会
k@treasure-data.com
@kzk_mover
Friday, April 5, 13
2. Who are you?
Kazuki Ohta (太田一樹)
• @kzk_mover, k@treasure-data.com
Treasure Data, Inc.
• Chief Technology Officer, Founded July 2011
Hadoop User Group Japan
• One of Founders
• “Hadoop徹底入門”
Open-Source Enthusiast
• Hadoop, memcached, jemalloc, MongoDB, memcached, uim, etc...
2
Friday, April 5, 13
7. Hadoop Versions
Too Many Variations (+Eco System)
from http://marblejenka.blogspot.jp/2013/01/hadoop.html 7
Friday, April 5, 13
8. Current Big Data Solutions: ‘Feature Creep’
http://en.wikipedia.org/wiki/Feature_creep 8
Friday, April 5, 13
9. We need Machete :)
EVERYTHING
with
ONE interface
Simple & Discoverable
Machete Design by James Lindenbaum
Heroku Co-Founder
http://www.youtube.com/watch?v=3BhDLm9jo5Y
9
Friday, April 5, 13
10. ‘Simplicity’ itself is a feature :)
by Anand Babu Periasamy
GlusterFS Co-Founder
10
Friday, April 5, 13
13. Battle Field of IaaS Vendors: SCM
HW Performance / Price In the near future, most of
HW buyers aren’t individual
companies, but cloud.
IaaS Vendors
Decrease with Battle Field:
Moore’s Law Supply Chain Management
On-Premise
Time
13
Friday, April 5, 13
14. PaaS, SaaS:
IT is all about Operation
More Sleep, More Value
With PaaS, you offload your development operations function and
have the PaaS provider handle the tools and components required to
deploy and manage applications reliably. - EngineYard
14
Friday, April 5, 13
15. PaaS/SaaS Battle Field: ‘Time’ is Money
Ideal
Customer Expectation
Value
Obsolete
over time
Reality
(On-Premise)
Upgrade
HW/SW Selection, PoC, Deploy...
Time
Sign-up or PO
15
Friday, April 5, 13
16. Introduction
to
Treasure Data
16
Friday, April 5, 13
18. Company Overview
Silicon Valley-based Company
• All Founders are Japanese
• Hironobu Yoshikawa
• Kazuki Ohta
• Sadayuki Furuhashi
OSS Enthusiasts
• MessagePack, Fluentd, etc.
• Cloud native
18
Friday, April 5, 13
19. 19
Our 50+ Customers – Fortune Global 500 leaders
and start-ups including:
250 billion records / month
in Feb 2013
2 million jobs executed
Friday, April 5, 13
21. Investors
Bill Tai
Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO
Othman Laraki - Former VP Growth at Twitter
James Lindenbaum, Adam Wiggins, Orion Henry - Heroku
Founders
Anand Babu Periasamy, Hitesh Chellani - Gluster
Founders
Yukihiro “Matz” Matsumoto - Creator of Ruby
Jerry Yang, Founder of Yahoo!
Dan Scheinman - Director of Arista Networks
where Hadoop was invented :)
+ 10 more people
Check out Today (2013/01/21)’s Morning 日経新聞!
• and....
21
Friday, April 5, 13
22. Treasure Data’s
Philosophy and Architecture
22
Friday, April 5, 13
23. Big Data Adoption Stages
Optimization What’s the best?
Predictive Analysis What’s a trend? Analytics
Statistical Analysis Treasure Data’s FOCUS
Why?
Alerts Error?(80% of needs)
Drill Down Query Where exactly?
Reporting
Ad-hoc Reports Where?
Standard Reports What happened?
Intelligence Sophistication
23
Friday, April 5, 13
24. Full Stack Support for Big Data Reporting
Our best-in-class architecture Data from almost any source
and operations team ensure the can be securely and reliably
integrity and availability of your uploaded using td-agent in
data. streaming or batch mode.
Our SQL, REST, JDBC, ODBC You can store gigabytes to
and command-line interfaces petabytes of data efficiently and
support all major query tools securely in our cloud-based
and approaches. columnar datastore.
24
Friday, April 5, 13
25. Treasure Data = Collect + Store + Query
25
Friday, April 5, 13
26. Example in AdTech: MobFox
1. Europe’s largest independent mobile ad exchange.
2. 20 billion imps/month (circa Jan. 2013)
3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)
4. Needed Big Data Analytics infrastructure ASAP.
26
Friday, April 5, 13
28. Our Value was Proven :)
Customer Our Value: Save Time!
Value
Obsolete
over time
Reality
(On-Premise)
Simple
Interface
Upgrade
HW/SW Selection, PoC, Deploy...
Time
Sign-up or PO
28
Friday, April 5, 13
29. Architecture Breakdown
Data Collection Data Store/Analytics Connectivity
• Increasing variety of • Remaining complexity in • Required to ensure
data sources both traditional DWH connectivity with
• No single data schema and Hadoop (very slow existing BI/visualization/
• Lack of streaming data time to market) apps by JDBC, REST
collection method • Challenges in scaling and ODBC.
• 60% of Big Data project data volume and
resource consumed expanding cost.
29
Friday, April 5, 13
30. 1) Data Collection
60% of BI project resource is consumed here
Most ‘underestimated’ and ‘unsexy’ but MOST important
Fluentd: OSS lightweight but robust Log Collector
• http://fluentd.org/
These talks will cover Fluentd :)
15:40∼ Log analysis system with Hadoop in livedoor 2013
by Satoshi Tagomori @ NHN Japan
16:30∼ いかにしてHadoopにデータを集めるか
by Sadayuki Furuhahsi @ Treasure Data, Inc.
30
Friday, April 5, 13
31. 2) Data Store / Analytics - Columnar Storage
31
Friday, April 5, 13
32. 3) Connectivity
REST API
td-command
Query
Query
Query API
Processing
JDBC, ODBC Driver Cluster
BI apps
Web App
Treasure Data
Result MySQL Columnar Storage
Postgres
32
Friday, April 5, 13
33. Most Difficult Challenge: Multi-Tenancy
All customers share the Hadoop clusters (4 Data Centers)
Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade
Job Submission
+ Plan Change
Local FairScheduler
datacenter A
Local FairScheduler
Global
datacenter B
Scheduler
Local FairScheduler
datacenter C On-Demand
Resouce Allocation
Local FairScheduler
datacenter D
33
Friday, April 5, 13
34. Conclusion
Big Data is too complex
• Needs Simplicity
• Machete v.s. Swiss Army Knife (Feature Creep)
IT is changing
• The value of Software itself is decreasing
• Operation is the key
Treasure Data = Cloud + Big Data
• Currently Focusing on Big Data Reporting
• Instant Value with Simple Interface
34
Friday, April 5, 13
35. We’re Hiring Top Talents, please contact me :)
35
Friday, April 5, 13
37. Big Data Market Growth
(average of IDC, Gartner and Wikibon stats) Big Data Revenue Breakdown
CAGR 38%
“In 2012…BI and Analytics are
rated #1 priorities.”
— Ravi Kalakota, Gartner
“Big Data is the new definitive source of
“More than half a billion dollars in venture capital
competitive advantage across all
has been invested in new big data technology.” industries.”
— Dan Vessett, IDC — Jeff Kelly, Wikibon
37
Friday, April 5, 13
38. Big Data Situation
Customer
Treasure Data
Value
RedShift
AWS
Obsolescence
over time
EMR
Software B
Software A On-premise
solutions
Time
Sign-up or PO
38
Friday, April 5, 13
39. Treasure Data Service Architecture
User
Apache
App Treasure Data
columnar data
App RDBMS
warehouse
Other data sources
MAPREDUCE JOBS
HIVE, PIG (to be supported)
td-command
Query
Query
Processing
API
JDBC, REST Cluster
BI apps
39
Friday, April 5, 13
40. Our Own Open Source technologies
We are open source natives and proud of our heritage.
We’ve contributed to Hibernate, Hadoop, Cassandra,
Memcached, KDE, MongoDB among others.
Our product reflects our deep commitment to the open-source
community and is built on top of open source software we’ve
authored and open sourced.
• Fluentd - a popular data collector daemon written in Ruby
www.fluentd.org (a leading user: SlideShare/Linkedin, One Kings Lane)
• MessagePack - a fast, compact serializer.
www.msgpack.org (a leading user: Pinterest, Redis)
Substantial commitment
(Code, Packaging, Documentation,
Sponsorship)
Tech marketing, Possible lead gen
40
Friday, April 5, 13