3. 3 Property of Automic Software. All rights reserved
Every day, we create 2.5 quintillion (18 zeroes !) bytes of data
So much that 90% of the data in the world today has been created in the
last two years alone. This data comes from everywhere: sensors used to
gather climate information, posts to social media sites, digital pictures and videos,
purchase transaction records, and cell phone GPS signals to name a few. This is
called “Internet of the Things”. Connect all together. But the data is called
BIG DATA
What is Big Data ?
Source.Forbes.com
4. 4 Property of Automic Software. All rights reserved
Think you can avoid Big Data?
The Big Data technology and services market represents
a fast-growing multibillion-dollar worldwide opportunity [...]
that will grow at a 26.4% compound annual growth rate to
$41.5 billion through 2018, or about six times the growth
rate of the overall information technology market […]
IDC - 2015
5. 5 Property of Automic Software. All rights reserved
• Make better, more quantitative decisions
• Reach new levels of profits, efficiently
• Predict with unprecedented accuracy to influence
business outcomes
• Deliver highly personalized customer experiences at
massive scale
• Make new discoveries using massive amounts of data
• Recognize new revenue streams from digital exhaust
Why are companies focused right now on Big Data ?
6. 6 Property of Automic Software. All rights reserved
Where does Big Data fit into the Enterprise?
7. 7 Property of Automic Software. All rights reserved
• Big data technologies must be integrated with
more traditional data systems and sources
• Efficient Dev-Test-Prod change control needs to
be implemented end-to-end
• Administration, development, operations, and
analytics must all need tools tailored to their roles
to maximize
• Automation is a core requirement for making
these complex systems accessible. It has to be
easy to use and customizable
Simplifying user experience and procedures
8. 8 Property of Automic Software. All rights reserved
A conflict in the skillset of analysts vs data engineers
People running the data platform
<workflow-app xmlns="uri:workflow:0.4" name="hive-add-partition-searchevents-wf">
<start to="hive-add-partition-searchevents" />
<action name="hive-add-partition-searchevents" retry-max="1" retry-interval="1">
<hive xmlns="uri:oozie:hive-action:0.4">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
...
...
<script>add_partition_hive_searchevents_script.q</script>
<param>YEAR=${YEAR}</param>
<param>MONTH=${MONTH}</param>
<param>DAY=${DAY}</param>
<param>HOUR=${HOUR}</param>
</hive>
<ok to="end" />
<error to="fail" />
</action>
<bundle-app name='BundleApp-LoadAndIndexTopCustomerQueries' xmlns='uri:oozie:bundle:0.2'>
<controls>
<kick-off-time>${jobStart}</kick-off-time>
</controls>
<coordinator name='CoordApp-LoadCustomerQueries' >
<app-path>${coordAppPathLoadCustomerQueries}</app-path>
</coordinator>
<coordinator name='CoordApp-IndexTopQueriesES' >
<app-path>${coordAppPathIndexTopQueriesES}</app-path>
</coordinator>
</bundle-app>
....
<coordinator-app name="CoordApp-LoadCustomerQueries"
frequency="${coord:days(1)}" start="${jobStart}" end="${jobEnd}"
timezone="UTC" xmlns="uri:oozie:coordinator:0.2">
...
<action>
<workflow>
<app-path>${workflowRoot}/hive-action-load-customerqueries.xml
</app-path>
</workflow>
</action>
</coordinator-app>
...
<coordinator-app name="CoordApp-IndexTopQueriesES"
frequency="${coord:days(1)}" start="${jobStartIndex}" end="${jobEnd}"
timezone="UTC" xmlns="uri:oozie:coordinator:0.2">
...
<action>
<workflow>
Automic helps to bridge the gap between the skillsets of the people
who need the tool and the skillsets required to run the tool
People wanting data
9. 9 Property of Automic Software. All rights reserved
Hadoop Open Source
“The Apache™ Hadoop® project develops open-source software for
reliable, scalable, distributed computing.”
“Open source as a development model promotes a universal access via a
free license to a product's design or blueprint, and universal redistribution
of that design or blueprint, including subsequent improvements to it by
anyone”
10. 10 Property of Automic Software. All rights reserved
Many people work on Hadoop
11. 11 Property of Automic Software. All rights reserved
3 Releases of the Hadoop Platform
12. 12 Property of Automic Software. All rights reserved
New capabilities keep on coming
13. 13 Property of Automic Software. All rights reserved
APIs do change constantly
15. 15 Property of Automic Software. All rights reserved
Proven value for Data Automation
Improve
Decisions
Business &
Operational
Intelligence
Data
Warehousing
Big Data
Call centre
performance
Hadoop Big
Data
automation
Data
Ingestion
across IaaS
Fast Cognos
Analytics
delivery
POS data
mining, ETL
& MFT
16. 16 Property of Automic Software. All rights reserved
Proven Value for Data Automation
Self-service
platform for
data scientists
We use Automic in our data center to define dependencies
between various jobs between our data center and the
cloud, and run them as ‘process flows’.
Automic ensures that the right data is delivered on time to
Data Scientists. This requires approximately 6,000 jobs per
day.
Ashi Sheth
Manger of Enterprise Services, Netflix
17. 17 Property of Automic Software. All rights reserved
Business Benefit to Netflix
To “Give Viewers What They Want”
Collect hundreds of terabytes of data daily
Petabyte-scale
Platform Engineers
… build templates and workflows using
ONE Automation
… enable data scientists to perform all
kinds of ad hoc analysis without having
to deal with the complexity of the
underlying data infrastructure
Automic
1 2
• >50m subscribers
• >40 countries
Recommendation EngineData Scientists
… perform data-driven experiments and tests on a daily basis
… and many other tools
using
… to improve
the quality of
recommendations
… resulting
in happy
customers!
3 4
18. 18 Property of Automic Software. All rights reserved
eBay relies on Automic
If Automic goes down eBay loses 70% of their web traffic to Amazon
– Automic automates Hadoop for eBay which provides all of their business
intelligence for optimized SEO
– Automic moves data, schedules the map
reduce, schedules the analytics and then
pushes the output to Google
19. 19 Property of Automic Software. All rights reserved
Automating ebay Data Warehouse Platforms
ebay DW environment
Teradata:
– Mozart: 2.6PB(used storage)/6.6PB(total storage)
– Martini: 1.4PB used, 8.5PM total
– EDW concurrent queries: 500+
Singularity (eBay specific TD):
– Vivaldi: 9.5PB(used storage) /16.9PB (total storage)
– Davinci:2.5PM used, 3.4PB total
• SG concurrent queries:100+
Hadoop:
– Hadoop Total: 71.5PB /91.9PB (used storage / total storage)
– Hadoop Ares: 29.5PB /41.4PB, Hadoop Apollo: 32.2PB /37.8PB,
Hadoop Artemis: 9.8PB/11.9PB
– Hadoop concurrent jobs running: 1000+ Source: http://www.slideshare.net/madananil/hadoop-at-ebay
20. 20 Property of Automic Software. All rights reserved
Automic’s Value to Big Data
• We help our customers to get out of the scripting business by abstracting the APIs from the
user by using Hadoop templates
• Current functionality can be extended by Automic and Users alike and in turn distributed via
Automic’s Marketplace, so there is no need to wait for vendors to catch up and release a
new Agent for new APIs (think Falcon, Ranger, Knox, Ambari, Cloudbreak, etc.)
• Automic and it’s Objects are agnostic – templates work with Hortonworks, Cloudera, MapR
– they can even help you transition between Hadoop distributions
21. 21 Property of Automic Software. All rights reserved
Contact
Dave Kellermanns
Chief Automation Architect
dave.kellermanns@automic.com
+1 (720) 440-2838
derive meaning = process and access
Collection means we must bridge movement of data in the old and new worlds
With Big Data, we expand our audience from the BI Analysts to the Data Scientist and is the foundation for business intelligence and predictive analytics.
In all of Big Data use cases you have both
BI
Choose a business outcome to improve
Decide what data will be relevant
Create a data model
Design reports, dashboards, and/or visualize
Data Science
Choose a business outcome to improve
Assemble all possible data
Evaluate the model
Operationalize the model
“Data Scientists uses a robot army and machine learning to get to the answer, an algorithm”
derive meaning = process and access
Collection means we must bridge movement of data in the old and new worlds
With Big Data, we expand our audience from the BI Analysts to the Data Scientist and is the foundation for business intelligence and predictive analytics.
In all of Big Data use cases you have both
BI
Choose a business outcome to improve
Decide what data will be relevant
Create a data model
Design reports, dashboards, and/or visualize
Data Science
Choose a business outcome to improve
Assemble all possible data
Evaluate the model
Operationalize the model
“Data Scientists uses a robot army and machine learning to get to the answer, an algorithm”
The proof / validation:
And here are two companies that use us for Big Data
The proof / validation:
And here are two companies that use us for Big Data
Hadoop jobs are growing exponentially @ eBay from 0 jobs in 2008 to 1 million per month today
eBay has even considered using the Hadoop file system (HDFS) as the DW in the future moving away from their traditional Teradata solution. Netflix is as well.
Teradata recognized the BigData trends and had acquired Aster in 2010 (11% investment) 2011 (full acquisition @ $263 million)
Automic integrates with eBay Marketplaces
Over 55,000 chains of logic and 150,000 data elements
Millions of queries run on ebay DW platforms everyday
> 40 Terabytes backed up each hour
100 TB of new data everyday and 100 PB of physical IO