How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies

Texas Instruments – The Road to
Understanding Inefficiencies with
InfluxDB
Presentation by: Mike Hinkle
1

2
Table of Contents
• About Texas Instruments
• What we do
• About me
• Common terms (Tools, States, Modules, Metrics, etc..)
• Data discussion
• Examples of data based decisions
• Current challenges
• InfluxDB and our current use case
• Why InfluxDB?
• Current setup and metrics captured
• Examples of dashboards currently in use
• Example of Grafana alert using InfluxDB derivative() on signal/trace
• Issues encountered using InfluxDB & Grafana
• What’s next

3
Texas Instruments – Brief Overview
• Texas Instruments was founded in 1951
• Previously ‘Geophysical Service Incorporated’
• Headquarters in Dallas, Texas
• Currently employees ~30k individuals World Wide
• Semiconductor Manufacturing Facilities located in:
• United States
• Germany
• Japan
• China
What we do:
Make calculators (just kidding.. We do, but this is a small part of our portfolio)
We Design, Manufacture, Test, Package, & Sell semiconductor devices

5
Common terms used in presentation
• Wafer – Typically made of Silicon (Si)
• Wafers are traditionally started in
counts of 25 which are referred to
as a LOT
• Lots can have fewer than 25
wafers
• Example of a wafer map is shown
to the right
• Each small square is a single die
• Each die is an IC comprised of
transistors, resistors, etc..
• After mfg and testing, the wafers
are typically diced/cut and
packaged for final testing

6
Common terms used in presentation – Cont.
• Tool – The equipment used to manufacture,
process & test semiconductors (wafers) i.e.:
• Furnaces (seen to right)
• Ion Implanters
• Epi Reactors
• Plasma Etchers
• Metrology (Rs, thickness, etc.)
• Testers
• State – Numeric or character code which
defines the current state of our tools
• PROD, 06T, 01, etc..
• Currently 515 state definitions in DFAB
• States mapped to multiple categories
• LYDOP Furnace Stack
(Phosphorus Doping)

7
Common terms used in presentation – Cont.
• Modules – The division of common tools
or processes into teams and groups for
the purpose of manufacturing
semiconductors
• Diffusion
• Surface Prep.
• Implant
• Epi
• Plasma
• Photo
• MultiProbe & Testprobe
• Modules typically divided into toolsets
and processes and owned by Engineers
• Modules report out often on metrics
• Metrics
• Availability (Ao)
• Overall Equip. Utilization
• Fail Events (raw and norm.)
• Mean Time to Repair
• Mean Time Between Failure
• First Pass Success
• Cycle Time
• … the list goes on …

8
Why is access to fast, accurate data important?
Common Data Based Decisions:
• Process adjustments (time, temperature, pressure, etc…)
• Process Stability (Cpk, UCL, LCL)
• Tool Stability (Ao, OEU, MTTR, MTBF)
• Track preventative maintenance (use based or time based)
• Troubleshooting
• Anomaly Detection and Interdiction
• Yield
• Planning
Inaccurate or misinterpreted data could lead to wafers being scrapped, yield loss,
customer deadlines not being met and ultimately losing valuable business.

9
Current challenges
• We need immediate feedback if/when tool/process issues arise
• Failure to respond could equate to scrapped wafers or missing customer
deadlines and ultimately lost revenue
• Email & text notifications are a must, automated interdiction is the goal
• We need a more efficient means of extracting the relevant data for reporting
• Reduction of time preparing for meetings and presentations
• Most high-level metrics are temporal
• We need a system which can be used efficiently by non-CS/CE employees
• Most employees in our factories come from Electrical, Mechanical,
Chemical Engineering or Physics backgrounds (not full-stack devs.)
• Our primary job does not revolve around creating applications
• We need the ability to prototype and experiment
• Write access to RDBMS is not easy to come by
• Flat file systems are not efficient

10
Why InfluxDB
• Familiarity from personal project
• Installation and setup took minutes
• Multitude of available clients
• NoSQL makes for minimal planning and easier prototyping
• SQL-esque: Do not need to learn a new query language (using InfluxQL)
• Fast query response
• Easier to handle edge cases (metrics split and aggregated across days, years)
• In-build math and forecasting functions (d/dt, integrate, Holt-Winters, etc..)
• Efficient hard disk memory usage
• Currently >1.5M points/records written per day
• Easy plugin for Grafana (experimenting w/ Chronograf now)
• Open Source edition is free (MIT License)

11
Current installation setup
Setup:
• Single InfluxDB instance (OSS) installed on Linux VM
• TSM, WAL & Raft data all stored on expandable development mount
• Loader scripts executed via cronjob, query RDBMS and load InfluxDB
• Data is filtered, sanitized & calculated if possible during initial query
• Grafana server running on separate Linux VM instance
• Apache web server is reverse proxied to point to Grafana server on port 3000
Metrics being collected:
• OEU (Overall Equipment Utilization: [Time Testing Good Die / Total Time])
• Ao (Tool Availability: [Uptime / Total Time])
• Occurrences of States (States define the state of our tools, PM’s, PROD, etc..)
• Cycle Time (how long it takes for a wafer to be tested [device tag])

12
Block diagram of current InfluxDB loading scheme
RDBMS
Linux Virtual Machine
• Runs Apache webserver
• Runs Grafana server
• Python cronjobs query
RDBMS, parse data and
write to InfluxDB
Linux Virtual Machine
• Runs InfluxDB instance
• TSM, WAL & Raft
Metadata stored on
expandable development
mount
RDBMS
RDBMS
InfluxDB

13
Grafana Dashboard – Probe OEU (1 sample/minute)
• Yellow Trace:
OEU Goal
(82%)
• Green Trace:
OEU (1m)
• Blue Trace:
OEU (1h)

14
Decomposing DFAB Probe’s OEU Time-Series
• Measurement
queried using
InfluxDB Python
client
• Mean(OEU) was
grouped by 1 hour
• No differencing or
transformations
were applied to TS
• Period length (T)
for seasonality
defined as 14 days
based on our shift
rotations
• Decomposition
performed using
stats-models
Python library

15
OEU: Trend Components & Forecast w/ fbProphet
• Plots generated using fbProphet (Python3.6)
• Component plot supports theory that our performance is weaker
on the back half of the week
• Next step is to determine performance and operation deltas
across shifts to improve metrics and output

16
Example of actual Grafana alert sent via email
• Not too concerned if our OEU
drops below our goal for a
short duration of time
• More concerning is if we drop
quickly (rate of change). This
could indicate a larger problem
• In the case seen to the left, an
application was failing to write
to a DB causing the tools to
lock up.
• IT needed to interdict on their
end, but we needed to notify
• InfluxDB’s DERIVATIVE()
function allows us to easily
trigger alerts for this use case

17
Grafana Dashboard – States Counts (1 sample/minute)
• Stacked Barchart, colored by tool state
• Pie charts show the state distribution
• State occurrences (left)
• E10 occurrences (right)
• Single stat panels show how many testers
are in production and how many are down

18
Grafana Dashboard – Ao Diffusion (1 sample/minute)
• Green Trace: Overall Tool Availability % (1m)
• Bottom plot shows my previous toolsets
(POCL & TEOS are tags/indexed)

19
Issues encountered
• Could not write to InfluxDB via the Python client library
• Solution: http_proxy & https_proxy were set and sourced in runcom (.cshrc)
file. I needed to unset these system variables for the script to work as
expected
• Problems using DERIVATIVE() function
• Solution: For my use case, I needed to specify a WHERE clause with a
time constraint.
• Initial DB writes were not showing correct time’s when viewed in Grafana
• Solution: InfluxDB expects UTC time and our DB time stamps are stored
and displayed for ‘America/Chicago’ time
• No primitive for aggregating by month in InfluxQL (1d, 14d, 30d, 1m?)
• Solution: Flux can apparently handle this or I can add a Month tag to the
data for grouping.

20
What’s next
• Prototype Dash app for single toolset (proof-of-concept)
• One-stop shopping for Toolset Health
• Tool stability high level
• Process stability high level
• Work to model high level metrics across shifts for DFAB probe
• Forecasting (planning, capacity, financial, staffing levels, etc..)
• ARIMA (Autoregressive Integrated Moving Average)
• SARIMA
• Holt-Winters
• fbProphet (experiment with add_regressors)
• Push more signals to time-series (TTR, TBF, etc..)
• Presentation at work on time-series data and my use of InfluxDB/Grafana
• Open flood gates for new ideas or application

21
Thank you for tuning in..
Questions and comments are
welcome

How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies

Similar to How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies (20)

More from DevOps.com

More from DevOps.com (20)

Recently uploaded

Recently uploaded (20)

How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies