Data Science in the Enterprise

Amr Awadallah's slides from his talk at TIBCO in collaboration with The Hive Think Tank on May 11th, 2017.

Technology

1© Cloudera, Inc. All rights reserved.
Data Science in the Enterprise
Amr Awadallah (@awadallah)
Founder, Chief Technical Officer, Cloudera

2© Cloudera, Inc. All rights reserved.
Typical Data Science Workflow
Data Engineering Data Science (Exploratory) Production (Operational)
Data Wrangling
Visualization
and Analysis
Model Training
& Testing
Production
Data Pipelines Batch Scoring
Online Scoring
Serving
Data GovernanceGovernance
Processing
Acquisition

3© Cloudera, Inc. All rights reserved.
• Team: Data scientists and analysts
• Goal: Understand data, develop and improve models,
share insights
• Data: New and changing; often sampled
• Environment: Local machine, sandbox cluster
• Tools: R, Python, SAS/SPSS, SQL; notebooks; data
wrangling/discovery tools, …
• End State: Reports, dashboards, PDF, MS Office
• Team: Data engineers, developers, SREs
• Goal: Build and maintain applications, improve
model performance, manage models in production
• Data: Known data; full scale
• Environment: Production clusters
• Tools: Java/Scala, C++; IDEs; continuous
integration, source control, …
• End State: Online/production applications
Types of Data Science
Exploratory
(discover and quantify opportunities)
Operational
(deploy production systems)

4© Cloudera, Inc. All rights reserved.
Common Limitations
Access
Many times secured clusters are hard
for data science professionals to
connect either because they don’t
have the right permissions or
resources are to scarce to afford them
access. In addition popular
frameworks and libraries don’t read
Hadoop data formats out-of-the-box.
Scale
Notebook environments seldom
have large enough data storage for
medium, let alone big data. Data
scientists are often relegated to
sample data and constrained
when working on distributed
systems. Popular frameworks and
libraries don’t easily parallelize
across the cluster.
Developer Experience
Popular notebooks don’t work well
with access engines like Spark and
package deployment and
dependency management across
multiple software versions is often
hard to manage. Then once a model
is built there is no easy path from
model development to production

5© Cloudera, Inc. All rights reserved.
Management of Dependencies

6© Cloudera, Inc. All rights reserved.
Open Data Science in the Enterprise
IT
drive adoption while maintaining compliance
Data Scientist
explore, experiment, iterate

7© Cloudera, Inc. All rights reserved.
https://medium.com/@KevinSchmidtBiz/data-engineer-vs-data-scientist-vs-business-analyst-b68d201364bc

8© Cloudera, Inc. All rights reserved.
Introducing Cloudera Data Science Workbench
Self-service data science for the enterprise
Accelerates data science from
development to production with:
• Secure self-service environments
for data scientists to work against
Cloudera clusters
• Support for Python, R, and Scala,
plus project dependency isolation
for multiple library versions
• Workflow automation, version
control, collaboration and sharing

9© Cloudera, Inc. All rights reserved.
How does CDSW help?
Visualizeresults
ChangeandCompileSource
code
Retrainandredeploy
ExtensibleEngines
ConfigurableSessions
Trivialtotweakparameters
MultipleUsers
Roles/Governance
CDH

10© Cloudera, Inc. All rights reserved.
The Importance of an Open Ecosystem
Open Ecosystem Black Box

11© Cloudera, Inc. All rights reserved.
Demo

12© Cloudera, Inc. All rights reserved.
Key Benefits
How is Cloudera Data Science different?
Works with fully secured clusters
One tool for multiple standard languages (Python, R, Scala)
Multi-tenant Architecture
Common Platform

13© Cloudera, Inc. All rights reserved.
1
A conference for and by practicing data scientists!
Save the Date: July 20th at the Chapel, San Francisco
Wrangle is a 1 day, single track community event that hosts the best and
brightest in the Bay Area talking about the principles, practice, and
application of Data Science, across multiple data-rich industries. Join
Cloudera, Facebook, Netflix and more to discuss future trends, how they
can can be predicted, and most importantly—how can they be anticipated.
wrangleconf.com
#wrangleconf | Powered by Cloudera

14© Cloudera, Inc. All rights reserved.
Thank You
Amr Awadallah (@awadallah)

This The Hive Think Tank talk by Venkat Srinivasan, CEO of RAGE Frameworks, focuses on successful applications of AI in the Enterprise. We start with a broad and more inclusive definition of AI in the context of enterprise business processes. We introduce a taxonomy of AI solution methods that broaden the focus beyond a narrow focus on deep learning based on neural nets. In line with the taxonomy, we present several successful AI applications in use today at major corporations across industries including financial services, manufacturing/retail, professional services, logistics. These applications range from commercial lending, contract review, customer service intelligence, market and competitive intelligence, signals for capital markets, regulatory compliance and others.

Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...

Sri Ambati

This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/cnU6sqd31JU Developing meaningful AI applications requires complete data lifecycle management. Sourcing, harvesting, labelling and ensuring the conduit to consume data structures and repositories is critical for model accuracy....but, one of the least talked about subjects. Intel’s optimized technologies enable efficient delivery of complete data samples to develop (and deploy) meaningful outcomes. During this session, we’ll review the considerations and criticality of data lifecycle management for the AI production pipeline. Bio: Meg brings more than 17 years of global product, engineering and solutions experience. She is presently a Solutions Architect with Intel Corporation specializing in Visual Compute and AAI (Analytics and AI) Architecture. She is passionate about the potential for technology to improve the quality of peoples’ lives and humanity on the whole.

Introdution to Dataops and AIOps (or MLOps)

Adrien Blind

Big Data and Semantic Web in Manufacturing

Nitesh Khilwani

Pieter den Hamer Alliander

BigDataExpo

Choosing the Right Document Processing Solution for Healthcare Organizations

Provectus

Looking to automate document processing in your healthcare organization? Learn from Provectus & AWS experts how to make data capture, conversion, and analytics more efficient. Process and manage documents faster and on a larger scale with AI & Machine Learning. In this presentation, we offer management and engineering perspectives on document processing with AI, to help you explore available options. Whether you are looking for a ready-made solution or plan to build a custom solution of your own, this webinar will help you find the best fit for your healthcare use cases.

Driving Digital Transformation through Service-Centric AIOps. Reduce the noise with artificial intelligence. To learn more about how OpsRamp can help you manage the unmanageable, visit us at - https://www.opsramp.com Also, follow us on social media channels to learn about product highlights, news, announcements, events, conferences and more - Twitter - https://www.twitter.com/OpsRamp LinkedIn - https://www.linkedin.com/company/opsramp

AI Data Acquisition and Governance: Considerations for Success

MIT Enterprise Forum Cambridge

data pipeline, governance, and for growth and updating models regularly needs to be part of the AI strategy from the outset. This session will cover: Defining AI governance: What this means and how definitions of subjects like ethics and effectiveness can differ between organizations. Data governance: Companies must rely on an AI governance program to ensure only high-quality, unbiased and consistent data are used in training. AI is a growing necessity for enterprises / businesses; it provides an avenue for scaling quickly and efficiently. Best practices / implementation: how to implement AI that meets the requirements of the organization’s defined sets of governances. Planning the data pipeline and growing/updating the models: AI is not static in the real world; models must be frequently updated to maintain relevance and accuracy. 3 key takeaways or attendee benefits of the session: Understand how to assess your organization’s need for AI; how to identify the opportune areas for transforming processes, interactions, scaling, cost. How to start the implementation process. Defining data and AI governance and how to build the training data pipeline within that framework. Best practices for maintaining AI; how to use data to evaluate models and continuously iterate on them to reflect the real world.

Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...

Matt Stubbs

Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP

About us tapsolutions

Jorge Williamson

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...

Enterprise Management Associates

Over the next 15 years, India's growth will be fueled by its startups. Today, there are over 20,000 startups in India that have created a value of $80 billion and employ 325,000 people. Over the next ten years, by 2025, there will be 100,000 startups in the country that would have created over $500 billion of value and employ 3.2 million people. This talk is about India's growth over the next 15 years and the prominent role that entrepreneurs and startups will play in its rapid evolution.

13 2792 big-data_keynote_presentation_finalpass_05_d_v02Erin Kerrigan

M2M Summit 2016 - IBM Presentation Cognitive Manufacturing

Thorsten Schroeer

Big Data Predictions for 2015

Pentaho

Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...

Mobile device management (MDM) provides the endpoint-focused processes and solutions for accelerating user productivity and device reliability. However, selecting an MDM platform that directly addresses an organization’s unique requirements and challenges can often be confusing given the diverse range of features and cost elements offered by competing solution providers. These slides from Steve Brasen, managing research director at leading IT analyst firm Enterprise Management Associates (EMA), reveal key results from the recently published EMA Radar™ on Mobile Device Management. In this side-by-side comparison of the 12 leading MDM platforms, solutions are empirically compared and graded against a broad range of measurements to objectively determine overall product strengths and cost efficiencies.

Operationalizing Data Analytics

VMware Tanzu

Hadoop is regarded as a key capability for implementing Big Data initiatives in the enterprise, but organizations have yet to realize its full business benefits. In this webinar, Pivotal and guest Forrester Research, Inc. Identify the use cases driving Hadoop adoption, and explore what is needed to transform initial investments into results. Learn about: Challenges Hadoop introduces, and how the right tools and platforms can help address them Shifts in the industry with regards to SQL and NoSQL systems and their implications to Big Data analytics Applying in-memory technologies for data management systems, data analytics, transactional processing and operational databases Watch the on-demand webinar here: http://www.pivotal.io/big-data/pivotal-forrester-operationalizing-data-analytics-webinar Learn how to maximize business value from all of your data here: http://www.pivotal.io/big-data/pivotal-hd

Big Data Analytics and Artifical Intelligence

Anand Narayanan

Using Machine Learning at Scale: A Gaming Industry Experience!

Games earn more money than movies and music combined. That means a lot of data is generated as well. One of the development considerations for ML Pipeline is that it must be easy to use, maintain, and integrate. However, it doesn’t necessarily have to be developed from scratch. By using well-known libraries/frameworks and choice of efficient tools whenever possible, we can avoid “reinventing the wheel”, making it flexible and extensible.

Part 3: Models in Production: A Look From Beginning to End

Part 1: Introducing the Cloudera Data Science Workbench

What's hot

GITEX Big Data Conference 2014 – SAP Presentation

Pedro Pereira

Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...

Big Data Spain

seven steps to dataops @ dataops.rocks conference Oct 2019

DataKitchen

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...

TheInevitableCloud

Commercializing Alternative Data

Next-Gen ML/AI Platform

Josh Yeh

Big Data Roundtable. Why, how, where, which, and when to start doing Big Data

Raul Goycoolea Seoane

Driving Digital Transformation through Service-Centric AIOps

OpsRamp

AI Data Acquisition and Governance: Considerations for Success

MIT Enterprise Forum Cambridge

Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...

Matt Stubbs

Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP

About us tapsolutions

Jorge Williamson

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...

Enterprise Management Associates

13 2792 big-data_keynote_presentation_finalpass_05_d_v02Erin Kerrigan

M2M Summit 2016 - IBM Presentation Cognitive Manufacturing

Thorsten Schroeer

Big Data Predictions for 2015

Pentaho

Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...

Operationalizing Data Analytics

VMware Tanzu

Big Data Analytics and Artifical Intelligence

Anand Narayanan

Using Machine Learning at Scale: A Gaming Industry Experience!

What's hot (20)

GITEX Big Data Conference 2014 – SAP Presentation

Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...

seven steps to dataops @ dataops.rocks conference Oct 2019

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...

Commercializing Alternative Data

Next-Gen ML/AI Platform

Big Data Roundtable. Why, how, where, which, and when to start doing Big Data

Driving Digital Transformation through Service-Centric AIOps

AI Data Acquisition and Governance: Considerations for Success

Big Data LDN 2017: Billions of Rows, the 5ws and H of Interpreting Fast and F...

Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP

About us tapsolutions

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...

13 2792 big-data_keynote_presentation_finalpass_05_d_v02

M2M Summit 2016 - IBM Presentation Cognitive Manufacturing

Big Data Predictions for 2015

Empowering a Mobile Workforce: An Objective Comparison of Leading Mobile Devi...

Operationalizing Data Analytics

Big Data Analytics and Artifical Intelligence

Using Machine Learning at Scale: A Gaming Industry Experience!

Similar to Data Science in the Enterprise

Part 3: Models in Production: A Look From Beginning to End

Part 1: Introducing the Cloudera Data Science Workbench

Analyzing Hadoop Data Using Sparklyr 

NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench

NOVA DATASCIENCE

Unlocking data science in the enterprise - with Oracle and Cloudera

Today, leading organizations struggle to make their data scientists productive in their modern data platforms. Data scientists find it difficult to use their existing open source languages (e.g. Python, R) and libraries with Hadoop, especially when the clusters are secured with Kerberos. At the same time, IT doesn't want to give special access to these users, who require very diverse and specific environment configurations to run their experiments. As a result, most data science teams work away from the big data cluster, often on their laptops or in other data silos. The negative business impacts are a lack of insight and agility for the most advanced users, and the security, governance, and cost issues that arise from data silos.

Machine Learning Model Deployment: Strategy to Implementation

DataWorks Summit

This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning. As part of this talk, an audience will learn more about: • How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models. • How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand. A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and lambda functions. Speakers Sagar Kewalramani, Solutions Architect Cloudera Justin Norman, Director, Research and Data Science Services Cloudera Fast Forward Labs

Data Science and CDSW

Jason Hubbard

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017

Stefan Lipp

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Part 2: A Visual Dive into Machine Learning and Deep Learning  

Data Science in Enterprise

Josh Yeh

From Insight to Action: Using Data Science to Transform Your Organization

Cloudera Altus: Big Data in the Cloud Made Easy

Cloudera Altus makes it easier for data engineers, ETL developers, and anyone who regularly works with raw data to process that data in the cloud efficiently and cost effectively. In this webinar we introduce our new platform-as-a-service offering and explore challenges associated with data processing in the cloud today, how Altus abstracts cluster overhead to deliver easy, efficient data processing, and unique features and benefits of Cloudera Altus.

Hadoop and Manufacturing

Manufacturers have an abundance of data, whether from connected sensors, plant systems, manufacturing systems, claims systems and external data from industry and government. Manufacturers face increased challenges from continually improving product quality, reducing warranty and recall costs to efficiently leveraging their supply chain. For example, giving the manufacturer a complete view of the product and customer information integrating manufacturing and plant floor data, with as built product configurations with sensor data from customer use to efficiently analyze warranty claim information to reduce detection to correction time, detect fraud and even become proactive around issues requires a capable enterprise data hub that integrates large volumes of both structured and unstructured information. Learn how an enterprise data hub built on Hadoop provides the tools to support analysis at every level in the manufacturing organization.

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud

Stefan Lipp

Introducing the data science sandbox as a service 8.30.18

The Vision & Challenge of Applied Machine Learning

What it takes to bring Hadoop to a production-ready state

ClouderaUserGroups

While Hadoop may be a hot topic and is probably the buzziest big data term, the fact is that many Hadoop projects get stuck in pilot mode. We hear a number of reasons for this. • “It’s too complicated.” • “I don’t have the right resources.” • “Security and compliance are never going to approve this.” This session digs deep into why certain projects seem destined to remain in development. We’ll also cover what it takes to bring Hadoop to a production-ready state and convince management that it’s time to start using Hadoop to store and analyze real business data.

Enterprise Metadata Integration, Cloudera

Neo4j

Manoj Shanmugasundaram - Agile Machine Learning Development

Agile Impact Conference

Similar to Data Science in the Enterprise (20)

Part 3: Models in Production: A Look From Beginning to End

Part 1: Introducing the Cloudera Data Science Workbench

Analyzing Hadoop Data Using Sparklyr 

NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench

Unlocking data science in the enterprise - with Oracle and Cloudera

Machine Learning Model Deployment: Strategy to Implementation

Data Science and CDSW

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Part 2: A Visual Dive into Machine Learning and Deep Learning  

Data Science in Enterprise

From Insight to Action: Using Data Science to Transform Your Organization

Cloudera Altus: Big Data in the Cloud Made Easy

Hadoop and Manufacturing

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud

Introducing the data science sandbox as a service 8.30.18

The Vision & Challenge of Applied Machine Learning

What it takes to bring Hadoop to a production-ready state

Enterprise Metadata Integration, Cloudera

Manoj Shanmugasundaram - Agile Machine Learning Development

More from The Hive

"Responsible AI", by Charlie Muirhead

Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...

Digital Transformation; Digital Twins for Delivering Business Value in IIoT

Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18

Dr. Bob Sutor is Vice President for AI, Blockchain, and Quantum Solutions at IBM Research. In this role he is the R&D executive leading a large global group of scientists, software engineers, and designers who create and integrate leading edge science and technologies to give IBM's clients the most advanced solutions available. Our work is often mathematically-based and thus includes AI technologies like machine learning, deep learning, text and image analytics, statistics, predictive analytics, and optimization. Sutor co-leads the IBM Research effort to support IBM's commercial blockchain efforts with advanced innovations across a broad range of its embedded technologies. He leads the group developing the next generation software stack and algorithms for quantum computers. Dr. Sutor has an undergraduate degree from Harvard College and a Ph.D. from Princeton University, both in Mathematics.

The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...

“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...

Social Impact & Ethics of AI by Steve Omohundro

The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...

In this The Hive Think Tank talk, Professor Jian Ma introduces machine learning methods that can be used to help tackle some of the most intriguing questions in genomics and biomedicine. He discusses the research projects in his group to study genome structure and function, including algorithms to unravel complex genomic aberrations in cancer genomes and gene regulatory principles encoded in our genome, by utilizing probabilistic graphical models and deep neural network techniques. The knowledge obtained from such computational methods can greatly enhance our ability to understand disease genomes.

The Hive Think Tank: The Future Of Customer Support - AI Driven Automation

The Hive Think Tank Panel Discussion moderated by Kate Leggett (Forrester) with panelists: Allan Leinwand (ServiceNow), Nitin Narkhede (Wipro), Jason Smale (Zendesk), Dan Turchin (Neva). The future of customer support is AI-driven virtual agents. Soon, we’ll interact conversationally with bots that know who we are, how we’re impacted, and what we need. Soon, the capabilities of virtual agents will far exceed those of today’s best human agents. We’ll receive support that is more reliable than friends, more accurate than social media, and less frustrating than waiting on hold.

The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change

In this The Hive Think Tank talk Harvard Business School Professor of Strategy Prof. Bharat Anand shares his insights on the Digital innovation trends that are shaping the way organizations will act in the future. In this talk, Professor Anand presents the findings from his forthcoming book. To answer these questions, Anand examines a range of businesses around the world, from Chinese internet giant Tencent to Scandinavian digital trailblazer Schibsted, from The New York Times to The Economist, and from talent management to the future of education.

Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik

The Hive Think Tank: Heron at Twitter

In this The Hive Think Tank talk, Heron team provides an introduction to Heron, how it is being used at Twitter and shares an operating experiences and challenges of running Heron at scale. They recently announced the open sourcing of Heron under the permissive Apache v2.0 license. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. Prior to Heron, Twitter used Apache Storm, which we open sourced in 2011. Heron features a wide array of architectural improvements and is backward compatible with the Storm ecosystem for seamless adoption.

The Hive Think Tank: Unpacking AI for Healthcare

The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...

In this presentation Prith Banerjee discusses how a sustainable future must become radically more efficient with the way we use energy. He shared how the Internet of Things (IoT) and the convergence of Operational Technology (OT) and Information Technology (IT) are enabling Schneider Electric's innovation at every level, redefining power and automation for a new world of energy which is more electric, decarbonized, decentralized and digitized. Prith shared how, in this new world of energy, Schneider ensures that Life Is On everywhere, for everyone and at every moment. He also shared a set of IoT predictions for the future, based on findings of the company’s recent IoT Survey of 2,500 top business executives.

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...

Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation. Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit. While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, and are enabling new state of the art external-facing services such as Azure Data Lake and more. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.

The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...

Bernie Roth is a founder of Stanford's d.school and author of The Achievement Habit: how to stop wishing, start doing, and take command of life. Bernie brings to the d.school a wealth of experience in teaching design, an intimate knowledge of the functioning of Stanford University, and a worldwide reputation as a researcher in kinematics and robotics. Together with Doug Wilde and the late Rolf Faste, Bernie developed the concept of a Creativity Workshop. This has been offered to students, faculty and professionals around the world. These same techniques have been made available to d.school students and are described in his book The Achievement Habit. He has found that these types of learning experiences enhance students’ ability to make meaningful positive difference in their own lives. He is especially pleased that his activities at the d.school have contributed to creating an environment where students and coworkers get the tools and values for realizing the enduring satisfactions that come from assisting others in the human community.

The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec

Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations. In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.

The Hive Think Tank: Sidechains by Adam Back, President of Blockstream

Over the last couple of years, blockchains have captured a significant mindshare of innovation in financial services, industrial Internet and digital commerce industries. The scope of applications of blockchain as a platform has long surpassed that of its origins in Bitcoin as a cryptocurrency technology. However, none of the new blockchain platforms has been able to reach Bitcoin's levels of scale, security and global reach. There have also been no standards to interoperate between different blockchain platforms for exchange of assets. In order to address these challenges, Sidechains were created as cryptographic systems that securely orchestrate exchange of information between different blockchains by leveraging the scale & maturity of the Bitcoin network. Sidechains are weaving a network of diverse blockchains to bring interoperability and Bitcoin’s scale & maturity. In this talk, Adam Back will talk about its role in building the decentralized world of blockchains.

The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.

Rocking the Database World with RocksDB Sage Weil, Ceph Principal Architect, Red Hat Sage helped design Ceph as part of his graduate research at the University of California, Santa Cruz. Since then, he has continued to refine the system with the goal of providing a stable next generation distributed storage system for Linux. Specialties: Distributed system design, storage and file systems, management, software development.

The Hive Think Tank: Rocking the Database World with RocksDB