PHP Trying to Curl Download csv from Website that needs a Session

I am trying to download public german short positions as a CSV, but I am coming up short.

Website address

My dummycode is as follows:

  1. Load first page and get the session.sessionid.
  2. Use the id to follow the “More search options” link
  3. Post a request into the “More search options”-page.
  4. Receive the csv

Any tips here? I guess I have problems with the cookie. Here is my code:

<?php
$ch = curl_init('https://www.bundesanzeiger.de/ebanzwww/wexsservlet?global_data.language=en&page.navid=to_nlp_start&session.sessionid=&global_data.designmode=eb');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// get headers too with this line
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0");

//curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__). '/cookie.txt');

$result = curl_exec($ch);

$trueposition = strpos($result, 'session.sessionid=');

echo '<br>--------------------------------------<br>';
echo substr($result,$trueposition+18,32);
$id = substr($result,$trueposition+18,32);

echo '<br>';
echo '-----------------------------------------<br>';



$url = 'https://www.bundesanzeiger.de/ebanzwww/wexsservlet?page.navid=nlpstarttonlpstart_new&nlp_search_param.extended_search=true&session.sessionid=' . $id;
curl_setopt($ch, CURLOPT_URL, $url);

$result = curl_exec($ch);



$data = array(

   "session.sessionid:" => $id,
   "nlp_search_param.publisher:" => "",
   "nlp_search_param.emittent:" => "",
   "nlp_search_param.isin:" => "",
   "nlp_search_param.search_history:" => "true",
   "nlp_search_param.date_start:0:" => "1",
   "nlp_search_param.date_start:1:" => "1",
   "nlp_search_param.date_start:2:" => "2001",
   "nlp_search_param.date_end:0:" => "1",
   "nlp_search_param.date_end:1:" => "1",
   "nlp_search_param.date_end:2:" => "2019",
   "nlp_search_param.position_start:" => "",
   "nlp_search_param.position_end:" => "", 
   "(page.navid=nlpresultlisttonlpresultlist_updatefilter):" => "Show net short positions"
);


 $url = 'https://www.bundesanzeiger.de/ebanzwww/wexsservlet?session.sessionid=' . $id . '&page.navid=nlpresultlisttonlpresultlistdownloadcsv';

curl_setopt($ch, CURLOPT_URL, $url);

$result = curl_exec($ch);


$url = 'https://www.bundesanzeiger.de/ebanzwww/wexsservlet';
 $postvars = http_build_query($data) . "\n";

 curl_setopt($ch, CURLOPT_URL, $url);
 curl_setopt($ch, CURLOPT_POSTFIELDS, $postvars);
 curl_setopt($ch, CURLOPT_POST, 1);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

 $server_output = curl_exec ($ch);

 echo 'Result:<br>';

 var_dump($server_output);

 curl_close ($ch);

?>

Update: I tried using the example Nigel linked to, but still I am not able to do it. However, the last link this next code creates ($url), when pasted into my browser (chrome) will sometimes work (csv downloads). However, it newer works with curl.


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.bundesanzeiger.de/ebanzwww/wexsservlet?global_data.language=en&page.navid=to_nlp_start&session.sessionid=&global_data.designmode=eb');
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);

curl_setopt($ch, CURLOPT_COOKIEFILE,  dirname(__FILE__) . '/cookie.txt');  //could be empty, but cause problems on some hosts
curl_setopt($ch, CURLOPT_COOKIEJAR,  dirname(__FILE__) . '/cookie.txt');  //could be empty, but cause problems on some hosts

$answer = curl_exec($ch);

if (curl_error($ch)) {
    echo curl_error($ch);
}

$result = $answer;

//var_dump($result);

$trueposition = strpos($result, 'session.sessionid=');

echo '<br>-------------------1-------------------<br>';
echo substr($result,$trueposition+18,32);
$id = substr($result,$trueposition+18,32);

echo '<br>';
echo '-----------------------1------------------<br>';


$url = 'https://www.bundesanzeiger.de/ebanzwww/wexsservlet?page.navid=nlpstarttonlpstart_new&nlp_search_param.extended_search=true&session.sessionid=' . $id;
curl_setopt($ch, CURLOPT_URL, $url);

$answer = curl_exec($ch);
if (curl_error($ch)) {
    echo curl_error($ch);
}

$result = $answer;

//var_dump($result);

$trueposition = strpos($result, 'session.sessionid=');

echo '<br>----------------2----------------------<br>';
echo substr($result,$trueposition+18,32);
$id = substr($result,$trueposition+18,32);

echo '<br>';
echo '-------------------2----------------------<br>';


$data = array(

   "session.sessionid:" => $id,
   "nlp_search_param.publisher:" => "",
   "nlp_search_param.emittent:" => "",
   "nlp_search_param.isin:" => "",
   "nlp_search_param.search_history:" => "false",
   "nlp_search_param.date_start:0:" => "1",
   "nlp_search_param.date_start:1:" => "1",
   "nlp_search_param.date_start:2:" => "2001",
   "nlp_search_param.date_end:0:" => "1",
   "nlp_search_param.date_end:1:" => "1",
   "nlp_search_param.date_end:2:" => "2019",
   "nlp_search_param.position_start:" => "",
   "nlp_search_param.position_end:" => "", 
   "(page.navid=nlpresultlisttonlpresultlist_updatefilter):" => "Show net short positions"
);

var_dump($data);

$url = 'https://bundesanzeiger.de/ebanzwww/wexsservlet/';
$postvars = http_build_query($data) . "\n";

//another request preserving the session

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postvars);

$answer = curl_exec($ch);
if (curl_error($ch)) {
    echo curl_error($ch);
}

$result = $answer;

//var_dump($result);

$trueposition = strpos($result, 'session.sessionid=');

echo '<br>----------------3----------------------<br>';
echo substr($result,$trueposition+18,32);
$id = substr($result,$trueposition+18,32);

echo '<br>';
echo '-------------------3----------------------<br>';

$url = "https://bundesanzeiger.de/ebanzwww/wexsservlet?session.sessionid=" . $id . '&amp;page.navid=nlpresultlisttonlpresultlistdownloadcsv';

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, "");

$answer = curl_exec($ch);
if (curl_error($ch)) {
    echo curl_error($ch);
}
echo '<br>';
echo '-------------------4----------------------<br>';
echo $url . '<br>';
var_dump($answer);

file_put_contents('test.csv', $answer);

This Post Has One Comment

  1. No Fault

    Some trial and error I found some issues in your original code that, when fixed, will return the .csv you’re looking for.

    1. The sessional changes

    After your initial request with no sessional, you store the resulting value and use it throughout. However when I test in the browser, this value changes after the POST. So you’ll want a function to grab the sessionid from each request you make:

    function get_sessionid($result)
    {
    $trueposition = strpos($result, ‘session.sessionid=’);
    $id = substr($result,$trueposition+18,32);

    return $id;
    }
    Be sure to update your $id value after each curl request.

    2. The server expects a GET request

    Your code requesting the .csv doesn’t unset the POST request. Also, that .csv request is triggered before an actual search. Using a POST request will prevent the server from returning the .csv data.

    I also tested things like setting the CURLOPT_SSL_* options, using cookie files and setting an Accept: header but it turns out none of those were to blame.

    3. Your POST keys are incorrect

    I am assuming that these have been copied from the Web Inspector. The POST keys shouldn’t have a : after them; they are shown in the inspector for readability.

    Your process would be:

    Request the original URL
    Get the sessionid
    Make a POST request for search results
    Update the sessionid
    Make a GET request for the .csv
    Edit: Full amended code below

    $id,
    “nlp_search_param.publisher” => “”,
    “nlp_search_param.emittent” => “”,
    “nlp_search_param.isin” => “”,
    “nlp_search_param.search_history” => “true”,
    “nlp_search_param.date_start:0” => “1”,
    “nlp_search_param.date_start:1” => “1”,
    “nlp_search_param.date_start:2” => “2001”,
    “nlp_search_param.date_end:0” => “1”,
    “nlp_search_param.date_end:1” => “1”,
    “nlp_search_param.date_end:2” => “2019”,
    “nlp_search_param.position_start” => “”,
    “nlp_search_param.position_end” => “”,
    “(page.navid=nlpresultlisttonlpresultlist_updatefilter)” => “Show net short positions”
    );

    // Searching
    $url = ‘https://www.bundesanzeiger.de/ebanzwww/wexsservlet’;
    $postvars = http_build_query($data) . “\n”;

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postvars);
    curl_setopt($ch, CURLOPT_POST, 1);

    $result = curl_exec ($ch);

    // Updating the sessionid
    $id = get_sessionid($result);

    // Request the CSV
    $url = ‘https://www.bundesanzeiger.de/ebanzwww/wexsservlet?session.sessionid=’ . $id . ‘&page.navid=nlpresultlisttonlpresultlistdownloadcsv’;
    curl_setopt($ch, CURLOPT_URL, $url);

    // Reset to a GET request
    curl_setopt($ch, CURLOPT_POST, FALSE);

    $result = curl_exec($ch);

    echo $result;

    curl_close ($ch);

Leave a Reply