Pivotal OSS meetup - MADlib and PivotalRgo-pivotal
With the explosion of big data, the need for fast and inexpensive analytics solutions has become a key basis of competition in many industries. Extracting the value of big data with analytics can be complex, and requires advanced skills.
At Pivotal, we are building open-source solutions (MADlib, PivotalR, PyMadlib) to simplify this process for the user, while maintaining the efficiency necessary for big data analysis.
This talk will provide information about MADlib, an open source library of SQL-based algorithms for machine learning, data mining and statistics that run at large scale within a database engine, with no need for data import/export to other tools.
It provides an overview of the library’s architecture and compares various statistical methods with those available in Apache Mahout.
We also introduce, PivotalR, a R-based wrapper for MADlib that allows data scientists and programmers to access power of MADlib along with the ease of use of R.
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSTIBCO Spotfire
Presented by: Dr. Bruce Aldridge, Sr. Industry Consultant Hi-Tech Manufacturing, Teradata
TIBCO Spotfire and Teradata: First to Insight, First to Action; Warehousing, Analytics and Visualizations for the High Tech Industry Conference
July 22, 2013 The Four Seasons Hotel Palo Alto, CA
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
Slides from the Pivotal Open Source Hub Meetup
"Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Science!"
As the need for data science as a key differentiator grows in all industries, from large corporations to startups, the need to get to results quickly is enabled by sharing ideas and methods in the community. The data science team at Pivotal leverages and contributes to this community of publicly available and open source technologies as part of their practice. We will share the resources we use by highlighting specific toolkits for building models (e.g. MADlib, R) and visualization (e.g. Gephi and Circos) along with their benefits and limitations by sharing examples from Pivotal's data science engagements. At the end of this session we hope to have answered the questions: Where can I get started with Data Science? Which toolkit is most appropriate for building a model with my dataset? How can I visualize my results to have the greatest impact?
Bio: Sarah Aerni is a member of the Pivotal Data Science team with a focus on healthcare and life science. She has a background in the field of Bioinformatics, developing tools to help biomedical researchers understand their data. She holds a B.S. In Biology with a specialization in Bioinformatics and minor in French Literature from UCSD, and an M.S. and Ph.D in Biomedical Informatics from Stanford University. During her time as a researcher she focused on the interface between machine learning and biology, building computational models enabling research for a broad range of fields in biomedicine. She also co-founded a start-up providing informatics services to researchers and small companies. At Pivotal she works with customers in life science and healthcare building models to derive insight and business value from their data.
Predictive analytics have long lived in the domain of statistical tools like R. Increasingly, however, as companies struggle to deal with exploding volumes of data not easily analyzed by small data tools, they are looking at ways of doing predictive analytics directly inside the primary data store.
This approach, called in-database predictive analytics, eliminates the need to sample data and perform a separate ETL process into a statistical tool, which can decrease total cost, improve the quality of predictive models, and dramatically shorten development time. In this class, you will learn the pros and cons of doing in-database predictive analytics, highlights of its limitations, and survey the tools and technologies necessary to head down the path.
Pivotal Data Warehouse in the Age of Digital TransformationVMware Tanzu
View the recording: https://content.pivotal.io/webinars/the-data-warehouse-in-the-age-of-digital-transformation?utm_source-pivotalwebsite&utm-medium-email-link&utm-campaign=datawarehouse-hiredbrains-q117
In the past years of Big Data and digital transformation “euphoria”, Hadoop and Spark received most of the attention as platforms for large-scale data management and analytics. Data warehouses based on relational database technology, for a variety of reasons, came under scrutiny as perhaps no longer needed.
However, if there is anything users have learned recently it’s that the mission of data warehouses is as vital as ever. Cost and operational deficiencies can be overcome with a combination of cloud computing and open source software, and by leveraging the same economics of traditional big data projects - scale-up and scale-out at commodity pricing.
In this webinar, Neil Raden from Hired Brains Research makes the case that an evolved data warehouse implementation continues to play a vital role in the enterprise, providing unique business value that actually aids digital transformation. Attendees will learn:
- How the role of the data warehouse has evolved over time
- Why Hadoop and Spark are not replacements for the data warehouse
- How the data warehouse supports digital transformation initiatives
- Real-life examples of data warehousing in digital transformation scenarios
- Advice and best practices for evolving your own data warehouse practice
This presentation covers both the Cloud Foundry Elastic Runtime (known by many as just "Cloud Foundry") as well as the Operations Manager (known by many as BOSH). For each, the main components are covered with interactions between them.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.