A number of media outlets have recently reported on our data retention policy change. We’ve noticed a few inaccuracies and have received some questions. So we wanted to clarify a few things for the record.
- Both Microsoft and Google apply a multistep process to de-identify search log file data. However, neither of these companies completes their respective de-identification processes for search log files until the 18 month mark.
- In Yahoo!’s upcoming policy change for search log files, we will be applying the same method we use today for de-identification. We are simply going to apply it 18 months once the policy goes into effect.
Yahoo! takes a 4-step approach to de-identifying search log file data.
- Step 1: Delete IP address for most search log files and apply a one-way secret hash to the limited IP addresses that are needed to help our systems detect and defend against fraudulent activity.
- Step 2: One-way secret hash (a form of encryption) unique identifiers from browser cookies.
- Step 3: Same as above but we additionally delete half of each identifier associated with a Yahoo! ID. We take this extra step for registration IDs because, unlike browser cookies that only identify a unique browser, we do associate these with personal information like names and email address.
- Step 4: Look for patterns common to personally identifying information such as credit card number formats, Social Security number format, telephone numbers, street addresses and non-famous names that often appear in search log files – and then replace those values so it’s no longer identifying to anyone. We would know that a telephone number was searched, for example, but don’t keep the number that was entered.
This process above is what we use today and will remain the same for search log files going forward – only now we will be using a timeframe consistent with what others industry players have been using since at least 2008.
Anne Toth
Chief Trust Officer
